OpenAI executives reveal: Scaling is here to stay, GPT-5's "dual-axis training" breaks the ceiling of intelligence

avatar
36kr
08-20
This article is machine translated
Show original

What improvements and significance does GPT-5 have? Where is the future of artificial intelligence heading? How does this stunning new OpenAI model reveal the various forms of intelligence?

OpenAI's Chief Operating Officer Brad Lightcap unveiled the answers to these questions in a deep dialogue.

Why is GPT-5 so special?

GPT-5 achieved a very interesting breakthrough: it can autonomously determine whether to conduct deep reasoning before answering.

In the past, users had to manually select models through ChatGPT's model selector for different tasks. After asking a question, sometimes you would choose a thinking mode, and sometimes not. OpenAI believed this experience was frankly confusing.

GPT-5 completely simplified this process. It not only automatically makes decisions for you but is essentially smarter. In writing, programming, health, and other fields, it has higher accuracy, faster responses, and an overall upgraded experience.

Everyone thought GPT-5's intelligence would grow explosively, so why did OpenAI choose usability over intelligence improvement as the main selling point?

Brad Lightcap explained that this is because intelligence fundamentally depends on the model's thinking time.

The more thinking time allocated, the higher the answer quality - this is a basic rule. When models are allowed to think during specific benchmark tests, AI performs far beyond all existing models.

Even without enabling thinking time, its answers are generally superior to non-thinking models like GPT-4.1.

Therefore, this is a comprehensive intelligence leap. But the key is the ability to dynamically allocate thinking time - OpenAI believes this is the core of improving user experience.

This progress is difficult to define simply as "exponential" or "incremental".

Humans have now entered a stage of multi-dimensional intelligence assessment - OpenAI is not avoiding the issue but explaining why GPT-5 is so special.

In core capabilities, its improvement is obvious: higher SWEBench test scores, better performance in various academic assessments. OpenAI has also specially enhanced GPT-5's benchmark performance in the health field.

[The translation continues in the same manner for the rest of the text]

Moreover, it is difficult to determine the boundary between AGI and non-AGI. Even if such a moment exists, it is uncertain whether everyone will realize it immediately. Because in working with these models, "excess capacity" is quite significant. The intelligence level of the "pocket doctor" that Altman mentioned is something that people have not yet fully utilized.

In a sense, even if AI development pauses for ten years, people will still have about ten years of new products to build and new methods to integrate models of GPT-5 level into interesting products and processes.

An interesting phenomenon is that the smarter the model, the more it requires product designers to invest more effort in how to integrate it into the system.

Brad Lightcap often uses an analogy:

Interns are very smart, but ultimately they do limited things: taking meeting notes, writing summaries, and doing basic analysis.

But if you bring in a PhD, their range of abilities is much broader, though they may not be immediately efficient on the first day of work. What you need to do is provide them with sufficient context, information, and tools to help them maximize their value subsequently. And this process takes longer than getting an intern up to speed.

He believes AI models are similar, and this is a continuous process, not a linear one.

This raises a super interesting question: From now on, does it make sense to continue making models smarter? Or should we build auxiliary capabilities? So for OpenAI, is the next goal to continue enhancing intelligence or focus on those "non-intellectual" capabilities?

Brad Lightcap says they want it all.

One part is pure IQ: the ability to recall knowledge and information about how things work.

But there are also reasoning abilities:

How to solve problems using other tools;

Reflection ability: reviewing one's own chain of thought, and timely correcting when you feel you're on the wrong path or haven't come up with the right strategy.

In these aspects, GPT-5 is better than previous systems.

For OpenAI, real-world benchmarks as an intelligence marker are becoming increasingly important, more critical than academic benchmarks.

And "continual learning" is definitely one of OpenAI's top priorities.

First Use of Reasoning AI

Shocking Free Users

Ethan Mollick from Wharton Business School pre-tested GPT-5 and offered an interesting perspective:

If you've been following this development curve, the progress of GPT-5 can be described as a huge leap, but also an unexpected leap.

He also mentioned: "These models have won gold medals in mathematical olympiads. I'm increasingly finding it difficult to understand what these huge advances really mean."

All current models are rapidly improving. So the question is, if you have a model at undergraduate biology level and it reaches graduate-level biology, ordinary chatbot users might not perceive this change, even though it has become smarter.

Some say that for heavy ChatGPT users, this upgrade will be noticeable, but possibly subtle.

But for ordinary users, especially free users, this will be a massive leap. Most free users have never experienced the power of a reasoning model. They mostly use GPT-4.0 and primarily engage in short, turn-based conversations similar to searches, which do not fully demonstrate the model's capabilities.

So, for many people, this will be their first time using a model with reasoning capabilities. Moreover, this is also their first experience with a "self-reflective" model: depending on the problem's difficulty, the GPT-5 model will independently decide how much time to spend thinking and how high-quality an answer to provide.

This is actually a good thing—if you always chase the most powerful AI, you might feel dizzy, and progress will seem more continuous. But if you've been using the best model from one or two years ago, this leap will be very shocking.

Everyone's entry point is different, which is interesting—it's a very personalized experience for each individual.

GPT-5 has particularly focused on the health sector, as this is one of the most common starting points for users, especially when they have health issues. This is an important goal for OpenAI.

Two Major Landing Scenarios

Health and Enterprise

Brad Lightcap believes AI will not replace doctors:

People still need to work with general practitioners or specialists for treatment.

But having a tool that can accompany and provide guidance throughout the process is comforting for many people and can indeed be effective in many situations.

OpenAI has always focused on pushing the model's capabilities in the health sector.

Starting from GPT-5, future models will continue to increase accuracy and decrease hallucination rates.

Specifically, GPT-5's accuracy is about 4 to 5 times that of previous models (depending on the measurement method).

In many aspects, the enterprise AI "ChatGPT moment" has not yet been seen.

Compared to consumers, AI for enterprises is another level of difficulty.

Enterprise processes are complex, multi-user dependency is common, and they must handle large amounts of context and use many tools. These tools must be used sequentially in certain ways and under certain restrictions. When they don't work, the fault tolerance is not that high.

Only with an improvement in baseline capabilities can AI be effective in the enterprise domain, including abilities to use tools, think systematically, solve problems, recursively correct its own errors, and perform long-context retrieval.

These capabilities are indeed important at the edge.

OpenAI has collaborated with multiple enterprises to test these models, especially GPT-5. From companies like Uber, Amgen, Harvey, Cursor, Lovable, JetBrains, OpenAI has received a lot of feedback.

Companies like Cursor, JetBrains, Windsurf, and Cognition have all reported that GPT-5 now feels like the most powerful coding model, both in interactive coding environments and in more agent-like coding environments.

Additionally, GPT-5 has shown significant improvements in reasoning and problem-solving abilities in other domains.

Harvey is a good example, working with law firms and heavily relying on its ability to reliably, accurately, and consistently analyze cases, providing the level of structured thinking required for legal analysis.

GPT-5 is already very powerful, and there will undoubtedly be even more excellent models in the future.

But currently, OpenAI is only focused on two things: how to get more people to use GPT-5 and how to support partners in developing an ecosystem based on it.

We are still in the stage of scientific exploration—and that's the most exciting part, like a race that has just begun, with OpenAI itself still understanding the current paradigm.

GPT-5 is an important first step, and only by understanding the present can we see the future.

References:

https://www.bigtechnology.com/p/799049c8-5054-45c0-8ee7-9de1f2191759

This article is from the WeChat public account "New Intelligence", author: New Intelligence, editor: KingHZ, published by 36Kr with authorization.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments