The long text processing capability of large models has risen by a hundred times, presenting both challenges and opportunities.

Large model vendors are competing to break through long text capabilities

Large models are continuously enhancing their ability to process long texts at an astonishing speed. Currently, several top large model technology companies and research institutions both domestically and internationally have made expanding context length a key focus of their upgrades.

From 4,000 tokens to 400,000 tokens, the text processing length of large models has achieved a hundredfold increase in a short period. OpenAI has upgraded multiple times, increasing the context input length of GPT-4 to 32,000 tokens. Anthropic has expanded the context length of its model Claude to 100,000 tokens. The domestic Kimi Chat released by Moon's Dark Side even supports an input equivalent to 400,000 tokens or 200,000 Chinese characters.

The enhancement of long text processing capabilities not only means that the model can read longer texts, but it will also promote the application of large models in professional fields such as finance, law, and scientific research. For example, the abilities of summarizing long documents, reading comprehension, and question answering will be significantly improved.

However, the length of the text is not necessarily better if it is longer. Research shows that there is no direct causal relationship between the context length supported by the model and its effectiveness. More importantly, it is how effectively the model utilizes the contextual content.

Currently, long text technology is facing the "impossible triangle" dilemma of text length, attention, and computing power. As the length of the text increases, the model struggles to focus on key information; while maintaining sufficient attention requires a large amount of computing power.

To overcome this dilemma, researchers have proposed various solutions:

  1. Use external tools to assist in processing long texts, such as splitting long texts into multiple short texts.

  2. Optimize the calculation method of the self-attention mechanism, such as LongLoRA technology.

  3. Optimize the model itself, such as LongLLaMA achieving longer sequence extrapolation through fine-tuning.

Despite the challenges that long text technology still faces, it is undoubtedly the key to further applying large models. In the future, large model manufacturers need to find the optimal balance between text length, attention, and computing power to achieve breakthroughs in long text processing capabilities.

TOKEN7.34%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 9
  • Repost
  • Share
Comment
0/400
AirdropHunterWangvip
· 08-14 11:19
Can earn more tokens again!
View OriginalReply0
Ser_Liquidatedvip
· 08-13 18:00
A loss of ten times leverage in one day
View OriginalReply0
GateUser-00be86fcvip
· 08-12 04:26
This word looks headache-inducing.
View OriginalReply0
SchroedingerAirdropvip
· 08-12 04:26
What's the point? Cryptocurrency Trading is still better.
View OriginalReply0
ApeShotFirstvip
· 08-12 04:14
Stop rolling it up, please.
View OriginalReply0
LiquidityNinjavip
· 08-12 04:07
Lick it! Long text is as fragrant as fried cake.
View OriginalReply0
SleepyValidatorvip
· 08-12 04:05
That's about it.
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)