[Introduction] Epoch AI's year-end review is here! Surprisingly, AI has not stagnated, but has accelerated.
Epoch AI has recently released quite a few new things.
They tested several open-source weighted Chinese models on FrontierMath.
As a result, their highest scores in levels 1-3 lagged behind the world's top AI models by about seven months.
In the more difficult fourth level, almost all open-source Chinese large models failed completely.
The only program to score was DeepSeek-V3.2 (Thinking). It answered one question correctly, earning approximately 2% of the marks (1/48).
Of course, while these large Chinese open-source models have failed, foreign models have also performed poorly.
Top-tier models like GPT and Gemini consistently score high marks on traditional mathematical tests (such as GSM-8k and MATH). However, their accuracy on FrontierMath is not particularly high.
However, as the table shows, their performance is at least somewhat better than the Chinese open-source model. Why is that? We haven't found the reason yet.
The reason why all AI models performed poorly is that FrontierMath is not an ordinary benchmark, but a test jointly created by 60+ top experts in the field of mathematics , and endorsed by Fields Medal winners.
It is a real math exam, not a simple quiz about substituting formulas and calculating calculus. Instead, it consists of original, expert-level problems covering number theory, real analysis, algebraic geometry, category theory, and even research-level problems that take hours or days to solve.
This also proves that when it comes to truly difficult math problems, AI is not yet a "problem-solving machine," but rather more like a primary school student who occasionally stumbles upon the answer.
AI evolution has accelerated again.
In addition, they released a new data insight report with surprising conclusions—
AI's capabilities are growing faster than ever before!
They used a comprehensive metric called the Epoch Capabilities Index (ECI) to track the development trends of cutting-edge AI model capabilities.
The results show that since April 2024 , the growth rate of AI capabilities has accelerated significantly— nearly twice as fast as before!
In other words, over the past few years, AI capabilities have not been on a steady upward trajectory—but rather suddenly began to surge upwards at a faster pace at some point.
The underlying reasons are these two: the reasoning models are stronger, and reinforcement learning is receiving more attention.
Many people feel that AI progress has slowed down because there haven't been any major leaps since the release of GPT-4.
However, data shows that AI progress has never stopped; it has only changed in direction and pace. It has been accelerating in certain core skills, such as reasoning ability, rather than relying on "larger models + more parameters."
Top 10 Insights of the Year
And just now, Epoch AI released a hardcore year-end review.
Throughout 2025, they published 36 data insights and 37 newsletters.
Which of these 70 short surveys about AI were the most popular?
Epoch AI has given us a year-end review.
The following 10 surveys are the most popular among readers.
The top 5 are the most popular data insights.
1. AI inference costs are dropping dramatically.
To be more precise, the price of LLM inference decreases rapidly but unevenly across different tasks.
Between April 2023 and March 2025, Epoch AI observed that the price of each token dropped by more than 10 times at the same performance level.
In other words, the price of each AI inference (outputting an answer) has decreased by more than 10 times.
As AI becomes more affordable, it will become more accessible to everyone: it will no longer be a technology that only large companies can afford, but a tool that everyone can use!
2. An AI "brain" is getting into your computer.
In just one year, cutting-edge AI performance has been realized on consumer hardware.
The top open-source models that can currently run on consumer-grade GPUs outperform leading AI in several performance metrics, including GPQA, MMLU, AA Intelligence, and LMARaena, by less than a year, or even less.
Since the most powerful open-source model can run on ordinary consumer-grade graphics cards, your laptop may be able to run large AI models in the near future!
Moreover, any cutting-edge AI capabilities could become widely available to the public in less than a year.
3. Most of the computing power at OpenAI 2024 was actually used for experiments.
Media reports indicate that in 2024, OpenAI used most of its computing resources not for inference or training, but for experimentation to support further development.
Yes, it's not what you think: it's not about training or providing 24/7 service to users; it's more about trial and error, exploration, and experimentation.
This shows that current AI research and development still relies heavily on a large number of experiments, rather than simply running a few benchmarks.
At the same time, the current cost of AI mostly comes from experimentation, rather than training and deployment.
4. Nvidia's chip computing power doubles every 10 months!
Since 2020, the deployed AI computing power of Nvidia chips has more than doubled every year.
Each new flagship chip released will consume the vast majority of existing computing power within three years.
Therefore, it can be said that GPUs are still the core fuel for AI computing, and their growth rate is extremely fast.
In order to maintain the current pace of AI development, computing resources need to be increased many times over, so Huang and other chip manufacturers can still make a profit!
5. Both GPT-4 and GPT-5 represent a major leap forward.
Despite some people complaining that OpenAI updates too quickly and doesn't show any progress, don't believe them!
Both GPT-4 and GPT-5 achieved significant leaps in benchmark tests, far surpassing the performance of their predecessors.
Therefore, this year's AI is not a mere accumulation of incremental innovations, but a true leap in capabilities.
So why were many people disappointed after the release of GPT-5?
This is because new models have been released more frequently over the past two years, rather than because capabilities have slowed down.
Top 5 most popular gradients: The thinking behind the insights
The next five are the most popular Gradient column articles.
Gradient is a column on Epoch AI that publishes short news snippets.
6. Is ChatGPT incredibly power-consuming? Not at all.
What is the average energy consumption of each inference of GPT-4o?
The answer is that it consumes less electricity than lighting a light bulb for five minutes.
This conclusion was also confirmed by Altman, and is similar to the energy cost of each Gemini prompt reported by Google.
In other words, concerns about the energy consumption of AI are actually more exaggerated than the reality.
Of course, AI's energy consumption has been growing exponentially, which could become a major problem in the future.
7. How did DeepSeek improve the Transformer architecture?
This article clearly explains the three core techniques used by DeepSeek v3 to achieve the position of the strongest open-source model at the time, despite lower computing power.
The three technologies are Multi-Head Potential Attention (MLA), improvements to the Hybrid Expert (MoE) architecture, and a multi-token prediction mechanism.
Just three days after this article was published, DeepSeek released R1, causing a major upheaval in the global AI community. Its performance is comparable to OpenAI o1, but its development cost is a fraction of that.
The entire AI community has learned a lesson: ingenious architectural innovation = lower R&D costs + faster deployment speed.
8. How far can inference models go? What are their limitations?
The authors analyzed the growth pattern and upper limit of reasoning training. The conclusion is that while reasoning is indeed important, its growth will not explode indefinitely.
OpenAI and Anthropic stated in early 2025 that their current RL scaling speed could only be maintained for 1–2 years at most , and would soon reach the limit of their own computing infrastructure.
Reasoning ability has become an extremely important extended dimension in model training, and has brought remarkable results in mathematics and software engineering.
However, there are clear limits to growth in this direction, which also means that the explosive improvement in model capabilities in 2024–2025 may soon slow down .
This is an important practical reminder for research and development planning.
9. How large is the "AI Manhattan Project"?
Epoch AI compared the Manhattan Project and the Apollo Program to estimate the potential scale of a national-level AI project in the United States.
Their conclusion was that this project was sufficient to support a training task 10,000 times larger than GPT-4 .
In other words, when AI is regarded as a national strategic science and technology project, its level can be magnified many times over!
10. Does the greatest value of AI not come from scientific research?
This last one is quite interesting.
We often hear the narrative that once AI can conduct scientific research automatically, technology will explode exponentially, and human productivity will experience an epic leap.
But Epoch AI offered a more sober assessment—
Most of the value created by AI may not come from accelerating research and development (R&D), but rather from the widespread automation of a large number of jobs throughout the economic system.
This is because, based on historical data, R&D activities have actually contributed quite limited to overall productivity over the past 30 years from 1988 to 2020.
Even if AI maximizes "scientific research efficiency," what truly drives the economy may not be breakthroughs in the laboratory, but rather changes in daily work methods.
Herein lies a crucial point of contention!
It's worth noting that leading figures like Ultraman, Demis Hassabis, and Dario Amodei all argue that "AI-automated R&D is the key to explosive growth."
If this assessment holds true, then the impact of AI will be rapid and dramatic. It will suddenly cross the "last hurdle of scientific research automation" and achieve a huge leap within a few AI companies.
But Epoch AI proposed another possibility, a more "sociological" version.
AI is more likely to change the world through a slow and decentralized process.
It won't happen overnight, but over several years or even decades, AI will be gradually absorbed by different industries and organizations, replacing repetitive labor.
If this is the case, the AI revolution will not be a sudden explosion, but a long-lasting tide.
References:
https://x.com/EpochAIResearch/status/2003510001277747518
https://x.com/EpochAIResearch/status/2003559099867496872
https://epoch.ai/data-insights/ai-capabilities-progress-has-sped-up
https://x.com/EpochAIResearch/status/2003178174310678644
This article is from the WeChat official account "New Intelligence" , edited by Aeneas, and published with authorization from 36Kr.



