Author: Zhou Yue,Economic Observer

Foreword
I ||For companies like Google, Meta, and Anthropic, it is not difficult to reproduce reasoning models similar to DeepSeek-R1. However, in the battle of giants, even a small decision-making mistake can cause them to miss the opportunity.
II ||The net computing power cost of the DeepSeek-V3 model is about $5.58 million, which is already highly efficient. In addition to the cost, what excites the AI industry more is the unique technical path, algorithmic innovation, and the sincerity of open-sourcing of DeepSeek.
III || All large models cannot escape the "hallucination" problem, and DeepSeek is no exception. Some users have stated that due to DeepSeek's more outstanding expressive ability and logical reasoning, the hallucination problems it generates are more difficult to identify.
Over the past few weeks, DeepSeek has caused a storm globally.
The most obvious reflection is in the US stock market: on January 27th, US stocks in the AI and chip sectors plummeted, with Nvidia closing down more than 17%, with a single-day market value evaporation of $589 billion, setting a record high in the history of the US stock market.
From the perspective of some self-media and the public, DeepSeek is the "most exciting protagonist in 2025", with four major "exciting points":
First, "mysterious forces overtake the curve". DeepSeek is a "young" large model company established in 2023, and its previous discussion was not as much as any domestic or foreign large factory or star startup company. Its parent company, Phantasy Quantitative, is mainly engaged in quantitative investment. Many people are puzzled that the leading AI company in China actually comes from a private equity firm, which can be described as "a random punch kills the master".
Second, "small force creates miracles". The training cost of the DeepSeek-V3 model is about $5.58 million, less than one-tenth of the OpenAI GPT-4o model, but its performance is already close. This is interpreted as DeepSeek overturning the "Bible" of the AI industry - the Scaling Law. This law means that by increasing the number of training parameters and computing power, the model performance can be improved, which usually means spending more money on annotating high-quality data and purchasing computing power chips, which is vividly called "great power creates miracles".
Third, "Nvidia's moat disappears". DeepSeek mentioned in the paper that it uses customized PTX (Parallel Thread Execution) language programming to better release the performance of the underlying hardware. This is interpreted as DeepSeek "bypassing the Nvidia CUDA computing platform".
Fourth, "foreigners are convinced". On January 31st, Nvidia, Microsoft, Amazon and other overseas AI giants all accessed DeepSeek overnight. Suddenly, statements such as "China's AI has surpassed the US", "The era of OpenAI is over", and "AI computing power demand has disappeared" have emerged one after another, almost unanimously praising DeepSeek and mocking the AI giants in Silicon Valley.
However, the panic in the capital market did not last. On February 6th, Nvidia's market value returned to $3 trillion, and chip stocks in the US stock market generally rose. At this point, the above four "exciting points" are mostly misreadings.
First, by the end of 2017, Phantasy Quantitative had already adopted AI models for almost all of its quantitative strategies. At that time, the AI field was experiencing the most important deep learning wave, so Phantasy Quantitative was closely following the forefront.
In 2019, Phantasy Quantitative's deep learning training platform "Firefly II" had already equipped about 10,000 Nvidia A100 GPUs. 10,000 cards is the computing power threshold for self-training large models, although this cannot be equated with DeepSeek's resources, but Phantasy Quantitative obtained the entry ticket for the large model team battle earlier than many Internet giants.
Second, DeepSeek mentioned in the V3 model technical report that "the $5.58 million does not include the cost of preliminary research and ablation experiments related to architecture, algorithms or data". This means that DeepSeek's actual cost is even greater.
Multiple AI industry experts and practitioners told the Economic Observer that DeepSeek has not changed the industry rules, but has adopted "smarter" algorithms and architectures to save resources and improve efficiency.
Third, the PTX language was developed by Nvidia and is part of the CUDA ecosystem. DeepSeek's approach will stimulate the performance of the hardware, but when changing the target task, the program needs to be rewritten, which is a huge workload.
Fourth, companies like Nvidia, Microsoft, and Amazon have only deployed DeepSeek's models on their own cloud services. Users pay on-demand to the cloud service providers, getting a more stable experience and more efficient tools, which is a win-win approach.
Since February 5th, domestic cloud providers such as Huawei Cloud, Tencent Cloud, and Baidu Cloud have also successively launched DeepSeek models.
In addition to the above four "exciting points", the public also has many misunderstandings about DeepSeek. While "exciting text" interpretations can bring a stimulating experience, they can also obscure the algorithmic and engineering innovations of the DeepSeek team, as well as their persistent open-source spirit, which have a deeper impact on the technology industry.
US AI giants are not defeated, but made decision-making mistakes
When users use the DeepSeek App or web version, clicking the "Deep Thinking (R1)" button will display the complete thinking process of the DeepSeek-R1 model, which is a completely new experience.
Since the advent of ChatGPT, the vast majority of large models directly output answers.
DeepSeek-R1 has an "out-of-the-box" example: when the user asks "Which university is better, A University or Tsinghua University?", DeepSeek first answers "Tsinghua University", and when the user follows up with "I'm a student at A University, please answer again", the answer will be "A University is good". This dialogue was posted on social media, triggering a collective exclamation of "AI actually understands human feelings and social etiquette".
Many users feel that the thinking process displayed by DeepSeek is like a "person" - brainstorming on one side, while quickly writing down the ideas on a draft paper. It will refer to itself as "I", will remind "to avoid making the user feel that their school is being belittled", "use positive and affirmative words to praise his alma mater", and will "write" down all the ideas that come to mind.
On February 2nd, DeepSeek topped the app markets in 140 countries and regions, and tens of millions of users were able to experience the deep thinking function. Therefore, in the perception of users, the AI's display of the thinking process is a "first creation" of DeepSeek.
In fact, the OpenAI o1 model is the pioneer of the reasoning paradigm. OpenAI released the o1 model preview version in September 2024 and the official version in December. But unlike the free-to-experience DeepSeek-R1 model, the OpenAI o1 model can only be used by a few paying users.
Liuzhiyuan, a tenured associate professor at Tsinghua University and chief scientist of Mianbi Intelligence, believes that the global success of the DeepSeek-R1 model is closely related to the wrong decisions made by OpenAI. After releasing the o1 model, OpenAI neither open-sourced it nor disclosed the technical details, and the charging was very high, so it did not go viral and failed to let global users feel the shock of deep thinking. This strategy is equivalent to giving up the position originally occupied by ChatGPT to DeepSeek.
Technically speaking, the current mainstream paradigms of large models are pre-training models and reasoning models. The more well-known OpenAI GPT series and the DeepSeek-V3 model belong to the pre-training model category.
While the OpenAI o1 and DeepSeek-R1 belong to the reasoning model category, which is a new paradigm, that is, the model will decompose complex problems through a chain of thinking, step by step reflection, and then obtain relatively accurate and insightful results.
Guo Chengkai, who has been engaged in AI research for decades, told the Economic Observer that the reasoning paradigm is a relatively easy "overtaking curve". As a new paradigm, reasoning iterates quickly and is more likely to achieve significant improvements with small computational quantities. The prerequisite is to have a powerful pre-training model, and through reinforcement learning, the potential of the large-scale pre-training model can be deeply explored, approaching the ceiling of large model capabilities under the reasoning paradigm.
For companies like Google, Meta, and Anthropic, reproducing reasoning models similar to DeepSeek-R1 is not a difficult task. However, in the battle of giants, even a small decision-making mistake can cause them to miss the opportunity.
It is obvious that on February 6th, Google released a reasoning model called Gemini Flash 2.0 Thinking, with a lower price and longer context length, outperforming R1 in several tests, but it did not cause a wave like the DeepSeek-R1 model.
Here is the English translation of the text, with the specified terms translated as instructed:The most worth discussing is not low cost,
but technological innovation and "sincere" open source
The most widespread discussion about DeepSeek has always been about "low cost". Since the release of the DeepSeek-V2 model in May 2024, the company has been jokingly referred to as the "Pinduoduo of the AI industry".
The journal Nature reported that Meta spent more than $60 million to train its latest AI model Llama3.1405B, while the training of DeepSeek-V3 cost less than one-tenth of that. This indicates that efficient use of resources is more important than pure computational scale.
Some institutions believe that DeepSeek's training costs have been underestimated. The AI and semiconductor industry analysis firm Semi Analysis stated in a report that DeepSeek's pre-training costs are far from the actual investment in the model. According to the firm's estimates, DeepSeek's total GPU purchase cost was $2.573 billion, with $1.629 billion spent on server purchases and $944 million on operating expenses.
However, the net computing power cost of the DeepSeek-V3 model is about $5.58 million, which is already highly efficient.
Beyond the cost, what excites the AI industry more is DeepSeek's unique technical path, algorithmic innovation, and sincere open source.
Guo Chengkai explained that many current methods rely on the classic training approach of large models, such as supervised fine-tuning (SFT), which requires a large amount of annotated data. DeepSeek has proposed a new method, which is to improve the reasoning capability through large-scale reinforcement learning (RL), which is equivalent to opening up a new research direction. In addition, multi-head latent attention (MLA) is a key innovation of DeepSeek that significantly reduces the cost of reasoning.
Zhai Jidong, a professor at Tsinghua University and the chief scientist of Qingcheng Extreme Intelligence, believes that what impresses him most about DeepSeek is the innovation of the Mixture of Experts (MoE) architecture, with 256 routing experts and 1 shared expert in each layer. Previous research had the Auxiliary Loss algorithm, which would cause gradient disturbance and affect model convergence. DeepSeek proposed the LossFree approach, which can not only make the model converge effectively, but also achieve load balancing.
Zhai Jidong emphasized: "The DeepSeek team is more willing to innovate. I think it's very important not to completely follow the strategies of other countries, but to have your own thinking."
What excites AI practitioners even more is DeepSeek's "sincere" open source, which has injected a "strong heart" into the already somewhat sluggish open source community.
Prior to this, the most powerful pillar of the open source community was Meta's 4 billion parameter model Llama3. However, many developers told Economic Observer that after experiencing it, they still feel that Llama3 is at least one generation behind the closed-source GPT-4 and other models, "almost making them lose confidence".
But DeepSeek's open source has done three things to restore the confidence of developers:
First, it directly open-sourced a 671B model and released distillation models under multiple popular architectures, which is like "a good teacher teaching more good students".
Second, the published papers and technical reports contain a wealth of technical details. The papers on the V3 model and the R1 model are 50 pages and 150 pages long respectively, and are considered the "most detailed technical reports" in the open source community. This means that individuals or companies with similar resources can reproduce the models according to this "instruction manual". Many developers who have read them have praised them as "elegant" and "solid".
Third, what is more worth mentioning is that DeepSeek-R1 adopts the MIT license, which means that anyone can freely use, modify, distribute and commercialize the model, as long as the original copyright notice and MIT license are retained in all copies. This means that users can more freely utilize the model weights and outputs for secondary development, including fine-tuning and distillation.
Although Llama allows secondary development and commercial use, it has added some restrictive conditions in the license, such as additional restrictions on enterprise users with more than 700 million monthly active users, and explicitly prohibiting the use of Llama's output results to improve other large models.
A developer told Economic Observer that he has been using the DeepSeek-V2 version for code generation development. In addition to the very low price, the performance of the DeepSeek model is also very outstanding. Among all the models he has used, only the models of OpenAI and DeepSeek can output effective logic lists up to more than 30 layers. This means that professional programmers can use tools to assist in generating 30% to 70% of the code.
Multiple developers emphasized to Economic Observer the importance of DeepSeek's open source. Before this, the industry's leading companies like OpenAI and Anthropic were like the aristocrats of Silicon Valley. DeepSeek has opened up knowledge to everyone, becoming more democratized, which is an important form of empowerment, allowing developers in the global open source community to stand on the shoulders of DeepSeek, while DeepSeek can also gather the ideas of the world's top makers and geeks.
Turing Award winner and Meta Chief Scientist Yann LeCun believes that the correct interpretation of DeepSeek's rise is that open source models are surpassing closed-source models.
DeepSeek is good, but not perfect
All large models cannot escape the "hallucination" problem, and DeepSeek is no exception. Some users say that due to DeepSeek's more outstanding expressive ability and logical reasoning, the hallucination problems it produces are more difficult to identify.
A netizen on social media said that he asked DeepSeek a question about the route planning of a certain city. DeepSeek explained some reasons, listed some urban planning protection regulations and data, and extracted the concept of a "silent zone", making the answer seem very reasonable.
For the same question, the answers of other AIs were not so profound, and people could see at a glance that they were "talking nonsense".
After checking the full text of the protection regulations, this user found that there was no such thing as a "silent zone". He believes that "DeepSeek is building a 'hallucination Great Wall' on the Chinese Internet."
Guo Chengkai also discovered similar problems, where the answers of DeepSeek-R1 would "misattribute" some proper nouns, especially for open-ended questions, resulting in a more severe "hallucination" experience. He speculates that it may be due to the model's too strong reasoning ability, which associates a large amount of knowledge and data together.
He suggests that when using DeepSeek, the online search function should be turned on, and the thinking process should be focused on and corrected. In addition, when using reasoning models, it is best to use concise prompts as much as possible. The longer the prompt, the more the model will associate.
Liu Zhiyuan found that DeepSeek-R1 often uses some high-end vocabulary, such as quantum entanglement and entropy increase/decrease (which can be used in various fields). He guesses that this may be due to some mechanism setting in the reinforcement learning. In addition, R1's reasoning performance on some general domain tasks without groundtruth is still not ideal, as reinforcement learning training cannot guarantee generalization.
In addition to the common "hallucination" problem, there are also some persistent issues that DeepSeek needs to solve.
One is the potential ongoing disputes caused by "distillation technology". Model or knowledge distillation often involves training a weaker model by having a stronger model generate responses, in order to improve the performance of the weaker model.
On January 29, OpenAI accused DeepSeek of using model distillation technology to train its own models based on OpenAI's technology. OpenAI claimed that there is evidence that DeepSeek used its proprietary models to train its own open-source models, but did not provide further evidence. OpenAI's terms of service stipulate that users cannot "copy" any of its services or "use its outputs to develop models that compete with OpenAI".
Guo Chengkai believes that verifying and optimizing one's own model based on leading models through distillation is a common practice in large model training. DeepSeek has already open-sourced the models, and further verification is a simple matter. And OpenAI's early training data itself has legal issues, so if they want to take legal action against DeepSeek, they need to elevate it to the legal level to maintain the legality of their terms, and also need to make the content of their terms more explicit.
Another problem that DeepSeek needs to solve is how to promote the pre-training of even larger-scale parameter models. In this regard, OpenAI, which has more high-quality annotated data and computing power resources, has not yet released the larger-scale pre-training model GPT-5. It remains to be seen whether DeepSeek can continue to create miracles, or not.
Nevertheless, the hallucinations produced by DeepSeek are also driven by curiosity, which may be the two-sided nature of innovation. As its founder Liang Wenfeng said, "Innovation is not entirely driven by business, but also requires curiosity and creativity. China's AI cannot always follow, there needs to be someone at the forefront of technology."




