After GPT-5, Altman went left and Liang Wenfeng went right

avatar
36kr
08-15
This article is machine translated
Show original

GPT-5 is officially released. Although it topped the test sets, user feedback is mixed, with many users hoping to retain GPT-4o. OpenAI aims to add model routing functionality to meet different user needs with various models and computing power costs.

Currently, OpenAI's effort to create a "unified model" still has a long way to go. GPT-5 lacks significant model capability breakthroughs or technological paradigm updates. OpenAI has focused more on product innovation - GPT-5 is a model with fewer hallucinations, more user-friendly, and able to help users solve more specific problems, but without new capabilities or completely resolving structural defects of large models. Recently, foreign media reported that DeepSeek is training its latest model using domestic chips, though the release date remains uncertain. The release of GPT-5 seems to indicate that large model capabilities may have hit a ceiling. Facing this "Transformer capability boundary wall", OpenAI has chosen to productize existing capabilities to the extreme and pursue the "super app" narrative. As the competitive pressure on model limits eases, DeepSeek is launching a "self-sufficient" side mission.

OpenAI, which aims to bring human society to an "extremely prosperous" state through AGI, is gradually moving away from the super app path while its revenue and valuation continue to soar. DeepSeek, hoping to explore AI capability limits and build an open-source ecosystem to promote technological inclusivity, may be solving a different problem.

Perhaps years later, when people review the timeline of large model industry development, they will find that multiple paths intersect at DeepSeek R1 and GPT-4o's release, and diverge after GPT-5.

01 GPT-5: Performance Leader Yet Falling Short of Expectations, Accelerating Productization

The market anticipated a paradigm shift, a moment redefining human-machine interaction. However, the final result looks more like a routine upgrade. Its model parameters are more numerous, training data broader, and it scores higher in some benchmark tests, but it has not demonstrated revolutionary progress in core intelligence. New York University Professor Emeritus Gary Marcus summarized GPT-5's performance in three words: "Late, overhyped, mediocre".

His analysis indicates that GPT-5 failed to eliminate inherent large language model defects. It still fabricates facts at times, the so-called "hallucination" problem. When facing tasks requiring multi-step logical reasoning, it still makes mistakes. In multimodal performance providing real-world understanding, there is no qualitative improvement.

These issues existed in the GPT-4 era, and the industry had hoped GPT-5 would provide solutions. However, OpenAI chose to patch and optimize the existing framework, then offer a more productized and user-friendly model tool based on this capability.

[Image]

If the stagnation of core intelligence is the perception of technical experts and deep users, its limited progress in multimodal capabilities disappointed technology enthusiasts. Before GPT-5's release, a consensus was that the next-generation AI's decisive battlefield would be multimodality. People imagined GPT-5 would seamlessly receive, understand, and integrate information from various channels like text, images, audio, and video. However, GPT-5's multimodal interaction performance is more like an optimized GPT-4V. It can accurately complete descriptive tasks like identifying objects in photos, but its capability boundaries become apparent when tasks shift to understanding.

As a benchmark in the industry that first combined Transformer algorithm capabilities with language, pioneered the large model era with ChatGPT, and organically integrated reinforcement learning into large model training to break through the reasoning ability ceiling, OpenAI has always been an industry leader. However, after GPT-5's release, apart from "unmet expectations" in performance, the features attracting external attention seem to be product-level changes.

OpenAI hopes to use "model routing" functionality to help users avoid choosing among multiple models, lower the usage threshold for new users, and reasonably allocate computing power to provide higher-quality services to more users with limited computing resources.

According to OpenAI, although GPT-5 significantly reduces model hallucinations, its performance on basic mathematical problems and understanding of the real world remains unsatisfactory, with many obvious errors. Conversely, due to more productivity-related content in training data, its emotional intelligence has significantly regressed, prompting ordinary chat users to threaten to unfollow GPT-4o's "comeback".

GPT-5 demonstrates OpenAI's "lying flat" approach to large model capability breakthroughs, almost indirectly announcing that the "large model capability wall" has arrived, or at least that large model technological breakthroughs have temporarily entered a plateau period. Whether model capabilities can return to the "fast lane" like the progression from GPT-3 to GPT-4o depends on researchers' breakthroughs and innovations in underlying technologies.

OpenAI's former Chief Scientist Ilya's summary of AI technology development trends in a late 2023 interview seemingly somewhat prophesied this moment.

[Image]

"Different researchers and projects will have different directions in a period, and when people discover a technology works, research will quickly converge in that direction, and then possibly return to a state of diverse exploration".

02 Can Liang Wenfeng Seize the Opportunity to Achieve Domestic Large Model "Self-Sufficiency"

If the Transformer technology wall has truly arrived, what reasonable expectations can we have for DeepSeek? Looking at DeepSeek's product release history, each heavyweight release solved an important problem in large model technology on its own timeline.

The DeepSeek-V2 series in May 2024 revolutionarily addressed long context processing efficiency, pioneering Multi-Head Latent Attention (MLA) mechanism, supporting up to 128K token processing, and triggering a price war among Chinese AI giants with extremely low API pricing (2 yuan per million tokens), significantly improving large model affordability and actual deployment potential.

DeepSeek-V3 appeared in December 2024 with a 671B parameter MoE architecture, addressing inference speed pain points by achieving 3x acceleration of 60 tokens per second, reaching GPT-4o's performance while maintaining resource efficiency, almost bridging the performance gap between open-source and closed-source models.

DeepSeek-R1 in January 2025 focused on reasoning capability improvement, matching or surpassing OpenAI's o1 model in AIME and MATH tasks, at a far lower cost, topping the US App Store and solving high-end AI access barriers, accelerating open-source AI's global popularization and democratization.

After V3 and R1 made DeepSeek truly stand out, it seems to have transformed from a company originating in quantitative trading, becoming famous for large models, to a technology company bearing more missions.

According to foreign media reports, DeepSeek is currently transferring the training of its most advanced large models to domestic chips. The path to large model domestication is far more difficult than ordinary people imagine. However, under the influence of unstable geopolitics and various factors, if they cannot escape dependence on NVIDIA GPUs, all Chinese AI companies will permanently have a Damocles sword hanging over their heads.

At this moment, OpenAI's GPT-5 release suggests that large model technology centered on Transformer has temporarily flattened its development curve. This sends a signal to all technology companies, including DeepSeek - they can confidently explore other side missions while continuously and steadily improving model performance.

Achieving domestication of frontier performance models from training to inference is a challenge not less difficult than developing a completely new "atomic bomb" even for a top AI company that has transformed large model R&D from "atomic bomb to tea egg". The technical problems to be solved in this process might be far more numerous than all the challenges overcome before releasing previous DeepSeek models.

First, the performance of domestic GPUs compared to NVIDIA's GPU single-card performance still has a near-generational gap. Even though domestic GPUs can try to compensate for the single-card performance difference through denser interconnection technologies, competing with NVIDIA's "100,000-card cluster" used by Silicon Valley large models and training the most top-tier models using domestic GPUs requires facing unimaginable engineering challenges.

Large model R&D cannot be separated from open-source frameworks like PyTorch or TensorFlow, which were originally optimized for mainstream international hardware. If DeepSeek wants to localize, it must migrate the entire software stack to domestic hardware, which means rewriting or modifying a large amount of code to be compatible with local computing architectures. Compared to mature mainstream open-source frameworks and CUDA ecosystems developed over many years, reconstructing a domestic software stack to approach the performance and stability of solutions developed over nearly 10 years is equally challenging.

However, if DeepSeek can continuously collaborate closely with domestic hardware manufacturers, and like DeepSeek's large model R&D, start from scratch and step by step reach the industry's forefront, there is hope to completely remove the Sword of Damocles hanging overhead.

In the direction of continuously improving large model training and inference efficiency, DeepSeek continues to explore and has achieved remarkable results.

In late July this year, the paper "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention" published by the DeepSeek team and institutions like Peking University, with Liang Wenfeng as the corresponding author, won the Best Paper Award at ACL 2025.

https://arxiv.org/abs/2502.11089

This paper for the first time brought sparse attention from theoretical reasoning into a complete training process, maintaining model performance and improving training efficiency, while also bringing up to 11 times inference acceleration. Winning the Best Paper Award at the top natural language processing conference ACL is sufficient to demonstrate the industry's recognition of this technology's value.

Willingness to publicly share such innovations that play a key role in commercial competition also reflects DeepSeek's continuous determination and ability to promote large model technology inclusiveness.

Let us wait and see what surprises the new DeepSeek models incorporating more innovations like "native sparse attention" will bring to the industry in terms of capabilities and efficiency, and how far they can push the localization of large model R&D.

This article is from the WeChat public account "Facing AI" (ID: faceaibang), authors: Hu Run, Miao Zheng, published with authorization by 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments