Source: Quantum Number
The financial world did not start to panic about DeepSeek until about a month later, but when this panic did occur, NVIDIA's market value shrank by more than $500 billion (about 3.6 trillion yuan), equivalent to an entire Stargate. It was not just NVIDIA, the market values of TRON, Tesla, Google, Amazon and Microsoft also declined.
According to Alexander Wang, CEO of Scale AI, the two artificial intelligence models recently released by DeepSeek can rival the best models of US laboratories. And DeepSeek seems to be working under limited conditions, which means its training costs are much lower than its US counterparts. Reportedly, the final training cost of its recent model was only $5.6 million (about 40.6 million yuan), which is about the same as the salaries of US AI experts. Last year, Anthropic CEO Dario Amodei said the cost of training models ranges from $100 million (about 725 million yuan) to $1 billion (about 7.251 billion yuan). According to CEO Sam Altman, the cost of OpenAI's GPT-4 exceeds $100 million (about 725 million yuan). DeepSeek seems to have overturned our view of the cost of artificial intelligence and may have a huge impact on the entire industry.
All this happened in just a few weeks. On Christmas Day, DeepSeek released a reasoning model (v3) that attracted widespread attention. Its second model, R1, was released last week and was described by venture capitalist and Trump advisor Mark Anderson as "one of the most astonishing and impressive breakthroughs I've ever seen." Trump's AI and crypto expert David Sachs said the progress of DeepSeek models indicates that "the AI race will be very intense." In addition to training data, both models are partially open source.
DeepSeek's success raises the question of whether billions of dollars in computing power are really needed to win the AI race. The traditional view has always been that large tech companies will dominate the field of artificial intelligence, simply because they have the spare cash to chase progress. Now it seems that large tech companies are just burning money. It's a bit tricky to calculate the actual cost of these models, because as Scale AI's Wang pointed out, due to sanctions, DeepSeek may not be able to accurately state how many GPUs it has.
Hugging Face research director Leandro von Werra said that even if the critics are right, DeepSeek has not accurately stated the number of GPUs it has (napkin math shows they are using optimization techniques, which means they are telling the truth), and the open source community will soon figure it out. His team started replicating and open-sourcing the R1 recipe last weekend, and once researchers can create their own model versions, "we'll quickly find out if the numbers are correct."
What is DeepSeek?
DeepSeek was founded two years ago and is led by CEO Liang Wenfeng, one of China's top AI startups. The company was spun off from a hedge fund founded by engineers from Zhejiang University, focusing on "architecture and algorithm innovations that could change the game rules" to build Artificial General Intelligence (AGI) - at least that's what Liang says. Unlike OP, the company also claims to be profitable.
In 2021, Liang started buying thousands of NVIDIA GPUs (just before the US imposed chip sanctions), and launched DeepSeek in 2023, aiming to "explore the essence of AGI" - AI as intelligent as humans. Like OP CEO Altman and other industry leaders, Liang also has a lot of grand talk. "Our goal is AGI," Liang said in an interview, "which means we need to research new model structures to achieve stronger model capabilities with limited resources."
And that's what DeepSeek is doing. The team has adopted some innovative technical approaches to make their models run more efficiently, and claim the final training run cost of R1 was $5.6 million (about 4.06 billion yuan), a 95% reduction from OP's o1. DeepSeek didn't start from scratch, but built the AI on top of existing open-source models - specifically, researchers used Meta's Llama model as a base. While the company hasn't disclosed its training data mix, DeepSeek does mention it used synthetic data or artificially generated information (which may become more important as AI labs seem to hit data bottlenecks).
Without training data, it's unclear to what extent this is a "copy" of o1 - did DeepSeek use o1 to train R1? In a post when the first paper was released in December, Altman wrote that "copying things you know work (relatively) easy" while "doing new, risky, hard things when you don't know if they'll work is extremely difficult." So DeepSeek's claim is that it's not creating new frontier models, just copying old ones. OP investor Joshua Kushner also seemed to say that DeepSeek "was trained on the leading frontier models from Silicon Valley."
Former OP policy researcher Miles Brundage said R1 used two key optimization tricks: more efficient pre-training and reinforcement learning-based chain-of-thought reasoning. DeepSeek has found smarter ways to train AI using cheaper GPUs, part of which is using a newer technique that requires the AI to "think" through problems step-by-step by trial and error (reinforcement learning), rather than mimicking humans. This combination allowed the model to achieve o1-level capabilities with less computing power and funding.
"DeepSeek v3 and the previous DeepSeek v2 are basically the same model as GPT-4, just with more clever engineering tricks to get more bang for their GPU buck," Brundage said.
It's worth noting that other labs have also adopted these techniques (DeepSeek used "expert mixture" technology, only activating part of the model's capabilities for specific queries. GPT-4 also used this approach). The DeepSeek versions innovated on this concept by creating more refined expert categories and developing more efficient communication, making the training process itself more efficient. The DeepSeek team also developed a technology called DeepSeekMLA (multi-headed subconscious) that greatly reduces the memory required to run AI models by compressing how the model stores and retrieves information.
What's shocking to the world is not just the architecture of these models, but that they were able to replicate OP's achievements so quickly in just a few months, rather than the typical over a year for major AI breakthroughs, Brundage added.
OP has positioned itself as uniquely capable of building advanced AI, and this public image has won over investors to build the world's largest AI data center infrastructure. But DeepSeek's rapid replication shows that technological advantages won't last long - even if companies try to keep their methods secret.
"In a sense, these closed-off companies clearly rely on people thinking they're doing the greatest things to survive, and that's how they maintain their valuations. Maybe they exaggerated a bit to raise more funding or build more projects," von Werra said. "As for whether they exaggerated their internal capabilities, no one knows, but it's clearly to their advantage."
Talking Money
Since the release of ChatGPT by OP in 2022, the investment world has been fantasizing about AI. The question is not whether we are in an AI bubble, but "is the bubble a good thing?" ("Bubbles have been unfairly given a negative connotation," wrote Deep Water Asset Management in 2023.)
It's still unclear whether investors understand how AI works, but they still hope AI can at least widely save costs. A PwC report released in December 2024 showed that among the surveyed investors, two-thirds expect AI to increase productivity, and a similar number expect profits to increase as well.
Here is the English translation: The company that has benefited the most from the hype cycle is NVIDIA, which produces the complex chips used by artificial intelligence companies. It is believed that in the artificial intelligence gold rush, buying NVIDIA stocks is like investing in the company that makes the shovels. Whoever dominates the AI race, they need a lot of NVIDIA chips to run their models. On December 27, NVIDIA's stock price closed at $137.01 (about RMB 993.42) - nearly 10 times the NVIDIA stock price in early January 2023. The success of DeepSeek has disrupted the investment theory that drove NVIDIA's stock price to soar. If the company is indeed using chips more efficiently (rather than simply buying more chips), other companies will also start doing the same. This could mean that the market size for NVIDIA's most advanced chips will shrink, as companies are all working to cut costs. "NVIDIA's growth expectations are indeed a bit 'optimistic', so I think this is a necessary reaction," said Navin Rao, Vice President of AI at Databricks. "NVIDIA's current revenue is unlikely to be threatened; but the significant growth of the past few years may be affected." The companies driven by this investment idea are not just NVIDIA. In 2023, the "seven giants" of NVIDIA, Meta, Amazon, Tesla, Apple, Microsoft and Alphabet outperformed other companies in the market, with a 75% increase in value. They continued this amazing bull market in 2024, with all companies except Microsoft outperforming the S&P 500 index. Only Apple and Meta were not affected by the DeepSeek incident. This craze is not limited to the public market. As venture capital firms pour money into this field, startups like OpenAI and Anthropic have also achieved dazzling valuations - $157 billion (about RMB 1,138.4 billion) and $60 billion (about RMB 435 billion) respectively. Profitability is not a big issue. OpenAI is expected to lose $5 billion (about RMB 36.3 billion) in 2024, although its expected revenue is $3.7 billion (about RMB 26.8 billion). The success of DeepSeek shows that simply investing a lot of money is not as protective as many companies and investors imagine. It suggests that small startups are more competitive compared to giants - they can even disrupt the known leaders through technological innovation. So while this is bad news for the giants, it may be good news for small AI startups, especially since their models are open source.