💞 #Gate Square Qixi Celebration# 💞
Couples showcase love / Singles celebrate self-love — gifts for everyone this Qixi!
📅 Event Period
August 26 — August 31, 2025
✨ How to Participate
Romantic Teams 💑
Form a “Heartbeat Squad” with one friend and submit the registration form 👉 https://www.gate.com/questionnaire/7012
Post original content on Gate Square (images, videos, hand-drawn art, digital creations, or copywriting) featuring Qixi romance + Gate elements. Include the hashtag #GateSquareQixiCelebration#
The top 5 squads with the highest total posts will win a Valentine's Day Gift Box + $1
OpenAI and Google play with double standards: training large models with other people's data, but never allowing their own data to flow out
Editors: Du Wei, Zi Wen
In the new era of generative AI, big tech companies are pursuing a "do as I say, not as I do" strategy when consuming online content. To a certain extent, this strategy can be said to be a double standard and an abuse of the right to speak.
At the same time, as the large language model (LLM) has become the mainstream trend of AI development, both large and start-up companies are sparing no effort to develop their own large models. Among them, the training data is an important prerequisite for the quality of the large model.
Recently, according to Insider reports, Microsoft-backed OpenAI, Google and its backed Anthropic have been using online content from other websites or companies to train their generative AI models for years. This was all done without asking for specific permission, and will form part of a brewing legal battle that will determine the future of the web and how copyright law is applied in this new era.
These companies are smart, but also very hypocritical
Whether big tech companies use other people's online content but don't allow others to use their own has solid evidence, which can be seen in the terms of service and use of some of their products.
Let's first look at Claude, an AI assistant similar to ChatGPT launched by Anthropic. The system can complete tasks such as summarization, search, assisted creation, question answering, and coding. Some time ago, it was upgraded again, and the context token was extended to 100k, and the processing speed was greatly accelerated.
Claude's terms of service are as follows. You may not access or use the Service in the following manner (some of which are listed here), and to the extent any of these restrictions are inconsistent or unclear with the Acceptable Use Policy, the latter shall prevail:
Claude Terms of Service address:
Likewise, Google's Generative AI Terms of Use states, "You may not use the Service to develop machine learning models or related techniques."
What about OpenAI's terms of use? Similar to Google, "You may not use the output of this service to develop models that compete with OpenAI."
These companies are smart enough to know that high-quality content is critical to training new AI models, so it makes sense not to allow others to use their output in this way. But how do they explain their reckless use of other people's data to train their own models?
OpenAI, Google, and Anthropic declined Insider's request for comment and did not respond.
Reddit, Twitter and others: Enough is enough
In fact, other companies were not happy when they realized what was happening. In April, Reddit, which has been used for years to train AI models, plans to start charging for access to its data.
Reddit CEO Steve Huffman said, "Reddit's data corpus is so valuable that we can't give that value away for free to the largest companies in the world."
Also in April this year, Musk accused OpenAI's main supporter Microsoft of illegally using Twitter's data to train AI models. "Litigation time," he tweeted.
OpenAI CEO Sam Altman tries to take this question a step further by exploring new AI models that respect copyright. “We’re trying to develop a model where if the AI system uses your content, or uses your style, you get paid for it,” he said recently, as reported by Axios.
Publishers (including Insiders) will have a vested interest. In addition, some publishers, including News Corporation of the United States, are already pushing technology companies to pay to use their content to train AI models.
The current AI model training method "breaks" the network
Some former Microsoft executives said that there must be a problem. Microsoft veteran and well-known software developer Steven Sinofsky believes that the current way of training AI models "breaks" the network.
He wrote on Twitter, "In the past, crawling data was used in exchange for click-through rates. But now it is only used to train a model and does not bring any value to creators and copyright owners."
Perhaps, as more companies wake up, this uneven data usage in the generative AI era will soon be changed.
Original Link: