🎉 Hey Gate Square friends! Non-stop perks and endless excitement—our hottest posting reward events are ongoing now! The more you post, the more you win. Don’t miss your exclusive goodies! 🚀
🆘 #Gate 2025 Semi-Year Community Gala# | Square Content Creator TOP 10
Only 1 day left! Your favorite creator is one vote away from TOP 10. Interact on Square to earn Votes—boost them and enter the prize draw. Prizes: iPhone 16 Pro Max, Golden Bull sculpture, Futures Vouchers!
Details 👉 https://www.gate.com/activities/community-vote
1️⃣ #Show My Alpha Points# | Share your Alpha points & gains
Post your
The long text processing capability of large models has risen by a hundred times, presenting both challenges and opportunities.
Large model vendors are competing to break through long text capabilities
Large models are continuously enhancing their ability to process long texts at an astonishing speed. Currently, several top large model technology companies and research institutions both domestically and internationally have made expanding context length a key focus of their upgrades.
From 4,000 tokens to 400,000 tokens, the text processing length of large models has achieved a hundredfold increase in a short period. OpenAI has upgraded multiple times, increasing the context input length of GPT-4 to 32,000 tokens. Anthropic has expanded the context length of its model Claude to 100,000 tokens. The domestic Kimi Chat released by Moon's Dark Side even supports an input equivalent to 400,000 tokens or 200,000 Chinese characters.
The enhancement of long text processing capabilities not only means that the model can read longer texts, but it will also promote the application of large models in professional fields such as finance, law, and scientific research. For example, the abilities of summarizing long documents, reading comprehension, and question answering will be significantly improved.
However, the length of the text is not necessarily better if it is longer. Research shows that there is no direct causal relationship between the context length supported by the model and its effectiveness. More importantly, it is how effectively the model utilizes the contextual content.
Currently, long text technology is facing the "impossible triangle" dilemma of text length, attention, and computing power. As the length of the text increases, the model struggles to focus on key information; while maintaining sufficient attention requires a large amount of computing power.
To overcome this dilemma, researchers have proposed various solutions:
Use external tools to assist in processing long texts, such as splitting long texts into multiple short texts.
Optimize the calculation method of the self-attention mechanism, such as LongLoRA technology.
Optimize the model itself, such as LongLLaMA achieving longer sequence extrapolation through fine-tuning.
Despite the challenges that long text technology still faces, it is undoubtedly the key to further applying large models. In the future, large model manufacturers need to find the optimal balance between text length, attention, and computing power to achieve breakthroughs in long text processing capabilities.