DeepSeek-R1 generates four times more misinformation than V3, raising concerns about AI agent Token in the crypto sector.

05-12

This article is machine translated

Show original

DeepSeek-R1, the flagship AI model specializing in reasoning from the DeepSeek lab in China, recorded a "hallucination" rate (generating false information) of up to 14.3% according to Vectara's HHEM 2.1 benchmark. This figure is nearly four times higher than the previous model, DeepSeek-V3 (which is not focused on reasoning), with a rate of only 3.9%.

This significant gap has led the cryptocurrency community to ask many important questions. Currently, an increasing number of AI agent Token rely on LLM models capable of reasoning for automated trading, signaling, and transaction execution on the chain.

Data from Vectara shows that R1 is "over-supplemented," leading to an increased rate of misinformation.

Vectara tested both DeepSeek models using their own HHEM 2.1 evaluation tool to measure hallucination rates. Additionally, the team re-tested using Google's FACTS methodology. The results showed that R1 generated more false statements and insufficient evidence than V3 in all test configurations.

The reason isn't just about the depth of inference. Vectara analysts found that R1 often "over-supplements," meaning it automatically adds information not present in the original content.

These additional details are sometimes true in themselves, but because they don't appear in the source data, they are still considered misleading. This approach inadvertently introduces fabricated information into seemingly logical and correct answers.

Vectara made this statement public on the X platform.

“DeepSeek-R1 has a hallucination rate of 14.3%, nearly four times higher than DeepSeek-V3,” Vectara emphasized in a post.

This phenomenon isn't unique to DeepSeek. Many observers report that other lab-based machine learning models, which are heavily reliant on reasoning, exhibit similar trade-offs. Further training through reinforcement learning to develop chain -of-thought capabilities also encourages models to produce bolder, more confident responses.

Why are AI Token in crypto facing this trade-off?

The cryptocurrency market now has hundreds of AI agent Token , notably Virtuals Protocol (VIRTUAL) , ai16z (AI16Z), and aixbt (AIXBT).

The entire industry has grown by approximately 39.4% in the last 30 days. Virtuals alone has surpassed a market Capital of $576 million.

Virtual Protocol (VIRTUAL) price performance. Source: Coingecko

Most of these AI agents integrate large language modeling (LLM) into their automation tools, allowing agents to post on social media, execute transactions, create Token , or provide market insights.

If the platform's AI "invents" prices, partnerships, or contract addresses, the consequences could directly impact the blockchain.

An analysis by BeInCrypto of AIXBT revealed that the agent had promoted 416 Token with an Medium return of 19% . However, this very method of operation can put Watcher at risk if the model is flawed.

The level of risk increases with the automation of agents. Agents who only read data and summarize market sentiment are less risky than agents who manage the funds themselves.

Reasoning-driven models are increasingly favored for AI agents performing multiple complex actions in quick succession . However, it is precisely in this use case that the 14.3% risk indicated by Vectara represents the most serious risk.

A hallucinatory truth at the beginning of an agent's thought chain can spread, influencing every subsequent decision.

LeCun argues that the problem lies in the architecture of the model.

Yann LeCun, Meta's chief AI scientist, has long emphasized that self-regressive LLM models cannot completely eliminate hallucinations. According to him, the architecture itself lacks the ability to deeply understand the real world.

Reinforcement training based on chain -of-thought can partially mask this error in narrow fields such as mathematics and programming. However, the root cause remains unresolved.

Some other advanced AI labs disagree. They argue that the industry has seen significant progress in reducing the rate of hallucinations through improved data retrieval, post-training refinement, and the addition of validation models. However, actual programmer reports often accurately reflect what the rankings show.

AI researcher xlr8harder, Chia his testing experience with R1 on X, summarized the daily experience as follows:

“DeepSeek R1 has a fragmented view of its thought chain … so it frequently ‘smokes me up’ with hallucinatory information,” Chia xlr8harder.

For AI agent developers in crypto, the key issue is risk management, not debating architectural philosophy. Designing agents that verify all information from the model through a validation step can help minimize errors.

Similarly, agents who use simpler, more conservative models for financial decisions may achieve safer outcomes.

Subsequent evaluation rounds and the release of the R1 version will show whether the trade-off between reasoning ability and accuracy is gradually narrowing.

Currently, the 14.3% versus 3.9% gap is an operational detail worth monitoring by developers and retail investors. It could be a key factor in differentiating between AI agent Token that offer a real product and Token that only promise in theory.

Sector:

Metaverse

Generative AI

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

TechFlow

When Futu becomes a matchmaking corner, overseas status becomes a form of hard currency for the middle class.

All-in station

Proposal to allow small and medium-sized enterprises to borrow capital using digital assets.

ODAILY

Vitalik has finally relented; ETH is the most important product of Ethereum.

ETH

1.41%