The inside story of GPT-5.2's failure revealed: The technical team didn't go astray, but users became the biggest victims.

avatar
36kr
12-19
This article is machine translated
Show original

OpenAI's tenth birthday celebration was not a very dignified one.

GPT-5.2, released that day, delivered a perfect performance: it swept many state-of-the-art benchmark tests, and its performance in competition scenarios such as mathematics and programming was outstanding. It was also officially described as an AI "super brain".

But when it came to social networks, it wasn't met with applause, but rather with a barrage of insults from users.

On X and Reddit, anger and disappointment were written in almost every comment. People once again missed the once "white moonlight" GPT-4o: some said that GPT-5.2 had become bland, boring, and like its edges had been smoothed out; others ridiculed it as a preachy message that "treats adults like kindergarten children".

When public opinion turned against OpenAI and its CEO Sam Altman, a sharp question arose: Why are users less fond of the models now that they are "smarter"?

Why are "smarter" models no longer popular?

The Information's latest report early this morning revealed the inside story.

Over the past year, OpenAI adhered to a golden rule: every generational leap in models was accompanied by an explosive growth in user numbers, because the improved user experience brought about by "becoming smarter" was immediately apparent. But now, this golden rule has broken down.

Of course, the model's improvements in intelligent and scientific computing remain significant. The research team spent months refining its reasoning capabilities, enabling it to tackle more complex mathematical and scientific problems, but for most ordinary users, this improvement is negligible.

https://www.theinformation.com/articles/openais-organizational-problems-hurt-chatgpt?rc=qmzset

In other words, improvements in intelligence do not necessarily equate to improvements in user experience.

Ordinary users rarely need a "competition-grade brain"; they need a "useful assistant for everyday tasks." OpenAI's large-scale analysis of 1.5 million conversations corroborates this judgment, showing that users' core needs are extremely practical: practical guidance (29%), information retrieval (24%), and writing (24%), while conversations related to programming tasks account for only 4.2%.

The contradiction becomes very specific: while the technical team is frantically working on math, physics, chemistry, and benchmark tests in the lab, users just want one sentence to solve their problems in the chat box—no beating around the bush, no lecturing, no dragging things out.

The excessive stretching of the battle lines is a major drawback.

For most of this year, Ultraman launched multiple new projects simultaneously: the video generation application Sora, music AI, browser, AI Agent, hardware devices, robots... The scope of the business is expanding, and resources are being fragmented.

This is actually a classic mistake most common among tech giants: rushing to open up second and third fronts before securing their core positions. In the short term, it may seem like "spreading the word," but in the long term, it's like biting off more than you can chew, a cardinal sin in warfare—each front lacks manpower, computing power, and the patience to refine products.

The internal tug-of-war between "research priority" and "product growth" at OpenAI is particularly evident in image generation:

Even though GPT-4o's Ghibli-style graphics briefly boosted ChatGPT's usage and user growth in March, OpenAI initially prioritized image model development. Once Nano Banana gained positive word-of-mouth, OpenAI urgently revisited the project, leading to internal disagreements.

Altman believes that image models are the key to user growth, while research director Mark Chen prefers to invest resources in other projects.

In addition, as the marginal benefits of Scaling Laws diminish, in order to break through the bottleneck of large models, OpenAI has bet on inference models over the past year, with a research team of more than 1,000 people focusing their resources on this, resulting in the marginalization of optimization for the daily experience of ChatGPT.

This approach not only scattered resources but also resulted in performance degradation during the early-year beta testing—the pursuit of adapting to "chat" scenarios actually weakened the purity of the reasoning model. Although "Thinking Mode" and "Deep Research" were later introduced to divert traffic and remedy the situation, user adoption was very low, and the actual everyday conversation experience did not become more appealing as a result.

In addition, compatibility issues often arise between the old and new models.

For example, before the release of GPT-5, researchers found that the model performed worse on some programming tasks after being integrated into ChatGPT—because the system adjusted its answers based on personalized information such as the user's occupation, which in turn interfered with the model's understanding and led to incorrect answers.

Admittedly, the inference models are becoming more and more powerful, but the ChatGPT experience is becoming increasingly poor.

When the direction of technological progress and the direction of user needs begin to diverge, who will compromise first? The answer is obvious.

The strong release of the Gemini 3 Pro ultimately cornered OpenAI, leading to the iconic scene of Ultraman issuing a "red alert," demanding that OpenAI employees refocus on ChatGPT and improve the product's appeal.

At the same time, Fidji Simo, head of applications at OpenAI, also elaborated on ChatGPT's vision in his personal blog, which is to shift from a primarily text-based dialogue system to a fully generative UI that can dynamically generate interfaces based on user intent.

Simo has also admitted that the company is still fundamentally research-centric, and "the product itself is not the ultimate goal."

Fidji Simo

From a business perspective, this statement is actually quite dangerous.

Unlike Anthropic, which focuses more on the API market, OpenAI's main revenue comes from individual subscriptions. In the consumer market, no one will pay for a company's "ultimate ideals"; users are only willing to pay for the immediate experience. This is like a restaurant chef being obsessed with developing Michelin-starred dishes, while the diners in the lobby simply want a bowl of steaming hot noodles.

However, if you conclude that OpenAI is in disarray because of this, you may be underestimating the company's resilience.

According to Mark Chen, as cited by Bloomberg, the "red alert" is not a new concept, but rather a routine management tool for wartime situations. This mechanism is activated whenever OpenAI needs to focus its efforts on a single objective or requires the team to set aside lower-priority tasks.

Podcast link: https://x.com/Kantrowitz/status/2001790090641645940

In his latest podcast, Ultraman also denied the excessive anxiety caused by sounding the red alert.

"First of all, the so-called 'red alert' is, in our view, a low-risk but absolutely necessary response measure," Altman admitted. "It's a good thing to be a little 'paranoid' and react quickly when potential competitive threats emerge."

He even mentioned the rise of DeepSeek at the beginning of this year, believing that it, like the current Gemini 3, is a kind of positive external stimulus.

"So far, Gemini 3 hasn't delivered the kind of devastating impact we initially feared. While it, like DeepSeek, precisely hit a nerve in our product strategy, it also forced us to make extremely rapid adjustments."

According to Altman, this state of emergency usually only lasts six to eight weeks. "I'm glad we have this rapid response mechanism; we won't be in this state for too long."

OpenAI clearly understands that slogans alone are not enough, and today they officially released GPT-5.2-Codex.

As an intelligent agent programming model designed to solve complex real-world software engineering problems, GPT-5.2-Codex integrates the terminal operation capabilities of GPT-5.1-Codex-Max on the basis of general intelligence, making it better suited to handling long-term tasks such as code refactoring and migration.

Also at the end of the podcast, when the host asked, "How much longer until GPT-6?", Altman frankly replied, "I don't know when we will officially name a model GPT-6, but I expect a new model with significant improvements over 5.2 to be released in the first quarter of next year."

From sounding the "red alert" to the counterattack with the GPT-5.2 series, and then to the ambiguous announcement of GPT-6, OpenAI is trying to rebuild confidence with new models and a new pace. However, what will determine the long-term outcome is still the hard barriers such as distribution access, ecosystem collaboration, and computing power costs.

Google's overt strategy, and Ultraman's $830 billion "empty city" ploy.

Google's advantage has never been limited to the Gemini 3 Pro model; it lies more in its almost unparalleled distribution channels.

Search, Chrome, office suites. In the AI arena, the moat is perhaps the shallowest among all tech products. The cost of switching for users is almost zero. When Google's AI products are as ubiquitous as air, it has become an almost unsolvable, open strategy —you don't need to be "convinced," you'll simply "use them out of the box."

More importantly, in its competition with Google, OpenAI's biggest weakness lies in its hardware shortcomings.

Compared to the efficiency advantage Google established by developing dedicated AI chips (TPUs) twelve years ago, OpenAI still spends billions of dollars annually renting computing power. Even if it tries to "catch up" by building its own data centers and chips, the fact remains that the user experience is being caught up and the cost is being crushed.

In the words of netizens:

OpenAI doesn't need a more powerful model right now; it needs AMD. If OpenAI acquires AMD, this AI war will be over. Google isn't afraid of OpenAI because it has its own TPU. But what it should really be worried about is OpenAI owning AMD.

In a recent video, OpenAI President Greg Brockman admitted that due to limited computing power, whenever a new feature is launched (such as the GPT-4o Ghibli-style feature at the beginning of the year), computing power must be diverted from the research department to the product department. This is a vicious cycle—to maintain today's user experience, tomorrow's technological development is forced to be postponed.

But computing power, in the final analysis, boils down to two words: burning money. And burning money on a massive scale.

According to the WSJ, OpenAI plans to launch a massive $100 billion funding round; if all goes well, this super unicorn will once again refresh the imagination of the capital market with a valuation of $830 billion by Q1 of next year.

Earlier this year, SoftBank agreed to invest $30 billion in OpenAI and last month sold its Nvidia shares worth $5.8 billion to raise funds for the investment, with the remaining $22.5 billion expected to be completed as soon as possible.

But the money issue isn't so simple. OpenAI's cash burn is projected to exceed $200 billion by 2030. In contrast, Google is financially sound and can even indirectly squeeze OpenAI's funding prospects through stock price fluctuations of partners like Oracle.

OpenAI, which is raising funds everywhere, seems to be racing against time. This has led to the joke: given Ultraman's fundraising ability, he might one day be able to "take away" Google and Nvidia as well.

But jokes aside, money can buy time, but it can't buy a good reputation.

So in the winter of 2025, OpenAI, after three years of rapid growth, was right to put on the brakes: consolidate its efforts, withdraw resources, and refocus its efforts on the daily experience of ChatGPT.

This was an expensive but necessary correction.

Technological leadership doesn't equate to a user-friendly product, and being number one in benchmark tests doesn't guarantee user satisfaction. More importantly, you can't just wait until users are nostalgic for older versions to ask about their experience.

This article is from the WeChat official account "APPSO" , author: APPSO, and published with authorization from 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments