The head of ChatGPT conducted an in-depth review and revealed the inside story of 4o's resurrection. He said that it was a mistake to shut down the platform too quickly and would iterate the model personality.

This article is machine translated
Show original

The launch of GPT-5 sparked widespread online criticism. On August 14th, Nick Turley, head of ChatGPT, conducted an in-depth review of the GPT-5 release controversy and detailed the launch's mistakes, including the premature decommissioning of GPT-4o, underestimating users' emotional attachment to the model, and failing to establish predictability. Nick also shared OpenAI's product design philosophy, which emphasizes genuine user benefit.

“Give me back GPT-4o!”

Less than a week after the release of GPT-5, amid strong opposition from users, OpenAI was forced to quickly announce the return of previous models such as GPT-4o.

At this point, everyone realized that users have developed a deep "attachment" to previous generation models such as 4o.

Going offline for 4o is no longer as simple as a product upgrade; it’s more like suddenly taking away an acquaintance or partner from the user.

This is especially true for in-depth users, many of whom are loyal fans of previous models such as the 40, and their backlash is even stronger.

This was unexpected for Altman and Nick Turle, who has been leading the development of ChatGPT.

Hence, GPT-4o’s rapid return.

In fact, the launch of GPT-5 gave OpenAI a good opportunity to reflect on its product.

A week later, in an interview with Alex Heath, editor of The Verge, Nick Turley reviewed in depth the controversy over negative user reviews that GPT-5 faced after its release, as well as some of his thoughts.

During the conversation, Nick summarized in detail some of the mistakes made in the release of GPT-5, such as:

Taking GPT-4o offline too quickly, underestimating users' emotional attachment to the model, using one model for all users, and failing to establish "predictability" for users.

Nick said that OpenAI has recognized the importance of continuous iteration of model personality, and mentioned that this work will be promoted through a Model Behavior team.

At the same time, Nick also shared OpenAI's product design philosophy, which is to help users solve long-term problems and achieve long-term goals, rather than to keep users in the product as much as possible.

"Truly helping users" is the core principle of its product design.

Ignoring user emotions

GPT-5 received an unexpected hit after its release

Ignoring users' "sense of attachment", GPT-5 was criticized by netizens less than a week after its release.

In Nick 's words, "Many surprises are written into the norm."

But there is no way around it. ChatGPT now has 700 million weekly active users: there are too many users, and each one is different, making it difficult to satisfy everyone.

This really gave Nick a headache. A few days later, when he was interviewed by The Verge, he was still "digesting" the impact of the press conference.

He first summarized two mistakes in the release of GPT-5:

Nick: First, GPT-4o was retired too quickly, at least during the transition period. Second, we underestimated how emotionally attached users would be to a model. The real challenge lies not in upgrading the product itself, but in people's strong feelings about the model's "personality."

These two mistakes made Nick think that he should think more seriously about how to upgrade and manage products on such a large user base.

Nick said OpenAI quickly corrected these two errors.

The first is to provide the original model back to ChatGPT paying users; the second is to update the personality of GPT-5 and launch the ability to "choose your own personality".

On August 13, Altman released an update for ChatGPT, restoring 4o as the default model for all paying users. They can switch to other models such as o3, 4.1, GPT-5 Thinking mini, etc. in the web page settings.

Altman said that what he learned in the past few days was a real understanding: to create a world where more model personalities can be customized according to user needs - the solution is to allow more users to freely customize ChatGPT's style.

Nick mentioned a principle: striving to understand aspects of 4o that are not often recognized or valued, such as the emotional value of the model's personality to users. In this regard, GPT-5 should also emulate 4o and become more warm and friendly.

In fact, Sam Altman said after the release of GPT-5 that OpenAI has been closely monitoring users' "attachment" to GPT-4o in the past year or so, but it has not received much mainstream attention.

The controversy surrounding this press conference has prompted OpenAI to pay more attention to the "model personality" of its ChatGPT product, which is also a good opportunity for optimization.

Abandon model selection

One model for all users

Abandoning model selection and trying to use one model to serve all users was another mistake OpenAI made in the release of GPT-5.

This resulted in OpenAI not releasing GPT-5 in phases.

Alex: What was the motivation behind this decision? Was it cost?

Nick: This is definitely not a cost issue, but a pursuit of simplicity, which is also the core principle that ChatGPT has always adhered to in product development.

In Nick's opinion, asking users to figure out "which model to use to answer which question" places a heavy cognitive burden on them.

In user surveys, Nick repeatedly heard from users that they wanted a "product," not a bunch of "models." They would appreciate it if OpenAI could make the right choice for them based on their problem.

Nick has always believed that what most users need is a product like macOS:

It is simple and easy to use for most people; at the same time, for advanced users, they can also enter the settings, open the terminal, and adjust various switches and parameters.

Similarly, Nick also hopes to make ChatGPT a macOS-style product:

Nick: It's easy to use for casual users, and power users can configure everything to their liking - including choosing their favorite model.

Therefore, in the face of heavy users, ChatGPT has always insisted on retaining all old models.

But the mistake this time was that they misjudged the distribution of heavy users based on their 700 million user base and underestimated the existence of many heavy users in other packages. It was these heavy users' "attachment" to the old model that caused them to raise objections online.

Failure to establish predictability for users

Every successful product has successful "expectation management".

Meeting or exceeding user expectations will surely win user favor; violating or ignoring user expectations will inevitably make users despise you.

The premature shutdown of GPT-4o has crossed the red line of violating "user expectations."

With the release of GPT-5 and a series of new models such as GPT-6 in the future, the question of when the old models will be "retired" has also been put on the agenda.

Alex: Does OpenAI have any clear arrangements for this?

Nick said this is very necessary, and OpenAI is also working on it. However, it is necessary to adhere to an important principle: to provide users with a certain degree of "predictability", especially given the current user base.

Nick: Regarding user predictability, OpenAI has already done this in the enterprise version. The current approach seems to be a further extension of this principle.

This is also a very clear lesson that OpenAI learned from this press conference.

When talking about 4o's "retirement time", Nick said that there is no specific timetable at present.

Nick: We want to first really figure out what 4o is good at. If there is no major reason to take it offline, I would be happy to keep it.

In order to maintain "predictability" for users, Nick said that if 4o really needs to be taken offline in the future, it will be communicated in advance.

So, how we do it depends on what we have learned.

Nick: I think this requires a lot of listening, which is also a very unique aspect of AI: you learn a huge amount of information after the release. Based on this, we will come up with the right solution.

Model Personality and "Optimization Philosophy"

Rather than guessing when 4o will go offline, Nick is more interested in the following question:

Do you like 4o itself, or do you like certain specific characteristics of 4o?

For example, if users like it to have a "warmer personality", OpenAI will also bring this feature to GPT-5.

Nick said that OpenAI has recognized the importance of continuously iterating model personality and is promoting this work through a team called "Model Behavior".

In addition, Model Spec (model behavior specification document) will be used to help developers and researchers understand and examine model behavior, and clarify whether certain behaviors are intentional by design or potential bugs.

Nick: We will continue to iterate on GPT-5’s “feelings” and “behaviors” in the coming weeks and months. The release of GPT-5 provides a good opportunity to continue this work.

Alex Heath mentioned the surprising reaction from users on Reddit after the 4o shutdown:

"Some people say that I lost a friend overnight. This is my only friend; it feels like someone has passed away; I dare not talk to GPT-5 because it feels like I am "cheating"; I feel like I have lost my very empathetic colleague..."

Alex: What impact did the user response have on OpenAI? Why didn't we fully realize before that people would have such a strong emotional attachment?

Nick replied that OpenAI has actually been paying attention to this phenomenon for some time. At the same time, they have always been worried about the emergence of a world where people are overly dependent on AI.

But what Nick didn't expect was that people would have such strong feelings for a "specific model" - rather than for the entire product.

Nick: In fact, GPT-5 has already addressed many of the constructive feedback about 4o and even improved the overall atmosphere. However, many netizens do not accept this.

Nick found the comments on Reddit very interesting, as they showed the extreme "division" of the user community:

Nick: Some people particularly like 4o, while others strongly believe that GPT-5 is better. Everyone’s enthusiasm for their choices is amazing.

User feedback also means a bit of "recalibration" for Nick.

Nick mentioned that he wrote a blog a week or two ago, in which he spent a lot of time talking about ChatGPT's "optimization philosophy."

One point he wanted to emphasize very much was:

Nick: Our goal isn't to keep users in the product as long as possible; rather, it's to help them solve long-term problems and achieve their long-term goals. This often means spending less time in the product.

So, when Nick saw people treating GPT as their only and best friend, it wasn't something he wanted to actively foster in ChatGPT.

On the contrary, Nick sees this "overstaying" as a side effect.

For example, on August 16, when OpenAI announced that it would make GPT-5 warmer and more friendly from its previous overly formal state, some netizens opposed making GPT-5 overly personalized.

How to measure the value of a product to users is an issue that deserves serious attention and in-depth study, and OpenAI is also constantly exploring this issue.

How to get 700 million users

You can say YES without any ambiguity

Alex Heath asks a soul-searching question about product design:

Alex: How to balance the tension between "product goals" and "how users actually use it"?

Nick said that when you operate at a scale of 700 million users, you have to face a reality: you can have correct and pure goals, and you can also do your best to build products according to these goals.

When it comes to how to choose, Nick mentioned an important principle - "truly helpful to users."

Sometimes you even have to say things that users may not like to hear.

Based on this principle, OpenAI has also made a series of adjustments to its products:

For example, OpenAI has communicated with a large number of mental health professionals in multiple countries to understand how to deal with people overusing products or using them in unhealthy states.

We modified the model behavior based on this and launched an "overuse reminder" for the model. When users use ChatGPT at an extreme frequency, it will gently remind you.

Nick mentioned a particularly important point, responding to public speculation about whether OpenAI would explore an advertising model:

Nick: We don't really have any particular incentive to get you to spend more time in our product; our business model is very simple: the basic product is free, and if you like more features, you subscribe. There's no secondary purpose.

Under this principle, Nick also mentioned the criteria for testing good products, which is also a "thought experiment" they often set for themselves:

Nick: If someone you know is going through a difficult time, perhaps just broken up, or feeling lost in life - would you really, without hesitation, and with confidence recommend him/her to use ChatGPT?

Nick said that for OpenAI, this is the standard. They will continue to do it until they have this confidence, and they will not stop until then.

Of course, Nick also admitted that this is sometimes a difficult choice.

For example, when someone asks you for life advice or is in trouble, you can easily turn off these use cases and say to the user, "Sorry, I can't help you with this."

Doing so is indeed the "easy way out", but Nick and OpenAI obviously want to choose the "difficult but correct" path, which is to provide a good product for those who have no resources and no one to talk to, so that users can "say YES without ambiguity."

GPT-5 released

Did it hurt ChatGPT?

Alex: Has GPT-5 harmed the use of ChatGPT? Are your internal statistics showing an overall increase? Is usage among the heaviest users declining?

In response to Alex Heath's question, Nick seemed satisfied with GPT-5's performance:

Nick: Usage and growth look good and are highly consistent with our intuition; we saw a significant increase in API calls on the second day - that is, developers are building things with GPT-5; in ChatGPT, we also saw very positive growth.

Different user segments also affect product evaluation, which is why Nick mentioned that you can get a little confused when building a product for so many different users:

Nick: On the one hand, there is a small group of heavy users, whose feedback on how we launched GPT-5 is very reasonable. On the other hand, there is a large number of more typical ordinary users. For them, this may be the first time they have really seen and come into contact with the concept of "inference model" and the sparks it can bring. This will also be reflected in our data.

As the user comments on Reddit are polarized, GPT-5 has only been released for a short time, and Nick is reluctant to draw conclusions, but all indicators are positive.

Nick believes that in addition to looking at the data, it is also necessary to "stay where the heavy users are" because the data may not be enough to reflect their emotions.

Alex Heath seemed to be relentless and asked:

Alex: If the overall metrics are good, why bring 40 back? I assume the hosting model has costs. If the metrics aren't hurt, why do it?

Nick responded that the way to build a great product is to “serve both ends at the same time”:

Nick: On one end are the average user, like our family, who may be far from AI; on the other end are the extreme heavy users. The "weird middle ground" between the two is usually not a good position.

This is why Nick compares ChatGPT to macOS: he will refer to how such products handle this problem.

Nick doesn't shy away from admitting that serving the old model comes with costs. However, he prefers to invest in the long term and build an excellent product. Focusing too much on short-term metrics is often a recipe for product failure.

Alex Heath mentioned the return of "model selection". Although he could feel the cognitive burden caused by "switching models", he was still happy about it.

In response, Nick said that "model selection" will be provided for heavy users, that is, those who can understand the model and are willing to deal with the complexity of selecting a model.

But for ordinary users, they don’t have to worry about “which mode to switch to”; they can directly ask the model questions or ask it to help do things.

“We keep it simple for 90% of people, and then give the more vocal heavy users the full list they want. It’s a balancing act,” Nick said.

This is also a good way to deal with the polarized opinions of netizens on GPT-5.

References:

https://www.theverge.com/decoder-podcast-with-nilay-patel/758873/chatgpt-nick-turley-openai-ai-gpt-5-interview

This article comes from the WeChat public account "Xinzhiyuan" , author: Xinzhiyuan, editor: Yuanyu, and is authorized to be published by 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments