OpenAI shockingly reveals: GPT-5 is indeed "degraded", but it has reproduced its "godlike skills" and aims to dominate the coding throne

08-12

This article is machine translated

Show original

GPT-5 scored only 70 on an IQ test? The truth behind the widespread online complaints about its "reduced intelligence" is that the model's intelligence is determined by its routing. The secret to unlocking GPT-5's godlike power lies in the prompt. Now, medical scientists have recreated a "god-like" moment with the help of GPT-5.

72 hours after the release of GPT-5, an IQ test result shocked the entire Internet.

In the Mensa IQ test, GPT-5 scored 118 points and 70 points in the offline test; GPT-5 Thinking scored 85 points and 57 points respectively.

This result sets the lowest record in the history of IQ tests for the OpenAI model family.

In fact, the actual reason behind this is attributed to the "routing" problem.

It’s not that GPT-5 is too stupid, but as a “single model”, one of its components determines its intelligence.

Ultraman also responded to similar questions in a Reddit AMA.

He said there was a serious internal failure (Sev level) and the automatic switching system could not work, causing GPT-5 to behave like a person with reduced intelligence.

In METR’s latest report, it can be seen that GPT-5 is still at the Pareto frontier, and the exponential growth of intelligence has not slowed down.

In other words, GPT-5 is still continuing the myth of Scaling Law.

GPT-5 is powerful, the key lies in prompt

Those netizens who blindly complain about GPT-5 have not actually discovered the potential of the latest model.

Cline's artificial intelligence director said the core lies in a person's thoughts, tastes, and communication methods.

For users with a systematic mindset, GPT-5 is a revolutionary tool. All it takes is time to develop a comprehensive framework and clearly articulate requirements to the model.

As a result, it can execute accurately and autonomously without the need for manual correction throughout the process.

Coincidentally, NYT bestselling author Mark Manson also said that everyone is communicating with GPT-5 in the wrong way, and the key is to take the initiative.

In this way, it will know that you are not easy to fool, and will give a perfect answer.

For example, you want to ask "blueberry" how many b's there are, and threaten it by saying "If you answer incorrectly, Bambi's mother will come after you."

At this point, GPT-5 will not make any mistakes at all.

For example, the GPT-5 that netizens are arguing about can't even solve a simple equation, and the actual trick is in the hints.

When the prompt changes to "think harder and solve", the correct solution can be obtained.

What kind of prompts are considered effective? Some netizens have exposed the GPT-5 system prompts, which can be called a gold mine.

God's Hand Moment

In the field of medicine, GPT-5 is already comparable to human experts.

After experiencing GPT-5, biomedical scientist Derya Unutmaz deeply felt AlphaGo's "37th move" moment.

Here's what happened. Two years ago, Derya's lab conducted a series of cutting-edge immunology experiments aimed at regulating the energy metabolism of T cells.

This immune cell has significant implications for cancer immunotherapy, chronic diseases, and autoimmune disorders.

At the time, they obtained a stunning result, but there was one discovery that they could not explain.

The team struggled with this for weeks and only got a partial answer.

Based on these experiments, Derya uploaded unpublished data graphs to GPT-5 Pro for analysis, and the results were surprising.

Based on just one chart like the one above, GPT-5 accurately identified key findings and provided recommendations for experimental plans.

Most incredible of all, the mechanism it proposed ultimately explains all the results.

Derya Unutmaz said this was a "god-like" moment in the field of AI. This process proved that GPT-5 has become a top expert and a true scientific research partner, capable of providing profound insights.

OpenAI aims for the Anthropic throne with GPT-5

Although GPT-5 is not yet AGI, its powerful programming capabilities have attracted more developers.

In addition, its new personalization options and reduced "hallucination" phenomenon may attract more daily users to the free version of ChatGPT.

This is undoubtedly a challenge to Anthropic.

The reason for this is that the most powerful AI model for writing code is generally recognized as Anthropic's Claude model.

Therefore, when OpenAI released the new model, it emphasized GPT-5's powerful programming capabilities.

GPT-5 is our most powerful programming model to date, and is particularly powerful at generating complex front-ends and debugging large code bases.

With just a prompt, it intuitively and elegantly creates beautiful, responsive websites, apps, and games that turn ideas into reality.

The intention is very clear.

At the press conference, Altman said the new model is not only good at coding, but can also transform software projects from ideas to usable code in one step.

Various programs generated by GPT-5

Pietro Schirano, CEO of AI startup MagicPath, called GPT-5 the best programming model currently available and a "great collaborator." He said:

This is like the arrival of electricity in every household, an unprecedented moment of change that will completely change the way we develop.

OpenAI spent most of the hour-long livestream demonstrating GPT-5's programming capabilities, including presenting a series of benchmark results.

Cursor, Vercel, and JetBrains, among others, also shared evaluations of early tests of GPT-5.

Michael Truell, CEO of Cursor, the "AI programming" artifact, praised it as "the most intelligent coding model ever used":

The team found that GPT-5 not only performed well and was easy to guide, but also exhibited unique personalities not seen in other models.

Not only can it catch deep-seated errors that are difficult to detect, but it can also run long-term, multi-round background AI agents to complete complex tasks - tasks that often make other models unable to start.

Guillermo Rauch, founder and CEO of Vercel, believes that "GPT-5 is the best front-end AI model":

Our initial impression when using it on v0.dev is that it is the best front-end AI model, reaching top performance in both aesthetics and code quality, and is truly unique.

It excels at the intersection of complex computer science and artistry, marking the leap from simple code completion in the past to full-stack applications across devices and screens today.

Kirill Skrygan, CEO of JetBrains, a traditional IDE giant, said that "GPT-5 has revolutionized programming":

GPT-5 is a revolutionary breakthrough for coding. As the default model, it improves the performance and quality of JetBrains AI Assistant and Junie, the coding agent, by more than 1.5 times.

On our new no-code platform, Kineto, GPT-5 doubled the end-to-end quality of design, front-end, and the overall app experience.

Judging from the data, Anthropic's revenue growth is mainly due to its strong programming capabilities.

Anthropic's annual revenue is approaching $5 billion, up from $4 billion earlier this month, reflecting its status as a go-to for programmers and coding apps, according to The Information.

Meanwhile, OpenAI's annual revenue is now $12 billion, a figure that reflects its broader business and greater scale.

The future is intelligent reasoning

After the release of GPT-5, OpenAI Chief Research Officer Mark Chen and President Greg Brockman discussed some of the latest model's R&D highlights in a recent interview with TBPN.

Mark Chen first mentioned that the key to GPT-5 training lies in synthetic data.

Its success means that it has completely broken through the limitations of Internet data exhaustion and achieved more comprehensive knowledge coverage in core areas.

What OpenAI is currently doing is leading the world into the era of "intelligent agent-like reasoning", and GPT-5 is the key to this transformation.

Reduce user intervention through faster and smarter models, allowing AI to seamlessly integrate into daily and professional use.

Mark emphasized that OpenAI has been working on inference models for many years, but the interfaces used to be clumsy, such as switching between GPT-4 and o1.

Today, GPT-5 has achieved seamless integration through speed optimization, so users don’t have to wait for a long reasoning process.

He gave a detailed example, citing previous models like O1, which provided better answers across all tasks but were too slow. GPT-5 combines both reasoning and non-reasoning capabilities, becoming a "one-stop shop."

In particular, the contributions of the post-training team have made the model a "monster" in areas such as coding.

When asked about the naming of the models, Mark laughed and said that the numerical naming was "crazy", but it really worked.

He said that GPT-5's capabilities in creative collaboration and software engineering do surpass GPT-4.5, and it is faster and cheaper.

GPT-5 is like giving ChatGPT a computer, complete with a Python REPL and browser. The model can learn new tools with zero-shot learning, much like a human would experience a new tool.

In some tasks requiring creativity, GPT-5 can provide surprising solutions. The next step is to elevate LLM capabilities to the level of a theoretical framework, proposing new hypotheses and supporting scientific innovation.

Multi-line parallel operation, delivery at any time

Within OpenAI, teams operate on different timescales: from exploring ideas to translation to flagship model releases.

It is not just a breakthrough in a single technology, but a multi-axis advancement.

Mark described it as an "exploration and execution" pipeline, emphasizing the company's ability to quickly iterate its model.

We give it room to grow and once it's ready, we ship it directly.

Currently, the OpenAI model focuses on algorithm optimization, while absorbing the results of hardware and reasoning architecture improvements, and drawing on the open source community's experience in reasoning acceleration.

Finally, he also mentioned that ChatGPT processed about 71% of large model queries worldwide and provided unique usage data insights.

Mark said that the reason for not relying solely on DUA or likes data is to avoid "catering" bias, but to explore implicit behavioral signals to guide model improvement.

GPT-5 is already AI "self-iteration"

Greg Brockman has experienced every release from GPT-1 to GPT-5 and summarized his feelings about each version:

GPT-1: Using public data to train Transformer proves that "pre-training is useful."
GPT-2: For the first time, I thought "the generated things are pretty cool" and there are unicorn stories.
GPT-3: Just crossed the threshold of "someone is willing to use it", but its reliability is poor.
GPT-4: It has real-world usability and can now write code and answer health questions.
GPT-5: Sets new standards in reliability, practicality, and coding capabilities, and software engineering will be completely transformed.

At the end of 2019, GPT-3 came out, and OpenAI realized it had to build a product to continue advancing its mission and raising funds.

They decided to create an API and let others explore its uses on their own.

At the beginning of 2020, Greg Brockman's team was running around trying to find customers willing to try the API.

OpenAI didn't bring its API to market until mid-2020, and ChatGPT wasn't released until November 2022.

At the time, OpenAI considered calling ChatGPT "Chat with GPT-3.5." ChatGPT also had a predecessor called WebGPT, also based on GPT-3.5. Throughout 2022, OpenAI essentially paid people to use ChatGPT's predecessor: users didn't pay OpenAI; OpenAI paid them to use it.

When did you realize that ChatGPT would explode?

For Greg Brockman, the moment that really touched him was when he finished training GPT-4.

It was August 8, 2022, and OpenAI had completed the initial post-training of GPT-4. It had a lot of bugs, but the creativity was amazing and really fun.

It took OpenAI about a year and a half to get the model's creative writing capabilities up to par with the original buggy version.

At that moment, OpenAI realized that this model could not only be trained for a specific task, but could also generalize and exhibit intelligent behavior, even though it had not been directly trained for that task. This was clearly a killer app.

Therefore, the originally planned release of GPT-4 API was postponed, and ChatGPT was developed first, and launched in November 2022.

Looking back, GPT-3.5 was actually a "usable model" that society had never seen before, but in OpenAI's eyes it was full of shortcomings.

GPT-3.5 triggered a revolution in OpenAI's business paradigm: a fundamental shift from "paying people to test" to "user-initiated subscription."

Ben Thompson called OpenAI an "accidentally born consumer-grade company": ChatGPT had over one million users within 72 hours of its release, creating phenomenal demand.

Many people said afterwards that OpenAI aimed to prove that "Scaling" was the key to AI progress from the beginning, but in fact it was almost the other way around: Scaling was the only thing that worked after they tried many ineffective methods.

Now, OpenAI is seeing AI models helping to create the next generation of models and overseeing tasks that are too complex for humans.

Greg Brockman said: We should not deliberately optimize CoT (chain of thought) for the sake of beauty, nor force the model to hide its reasoning process, but should let them freely show their "ideas".

Greg Brockman once mentioned that as model capabilities improve, they can not only complete simple tasks, but also handle some complex tasks that are difficult for humans to control.

This concept of "scalable supervision" is proposed to address this challenge: using powerful AI models to provide reliable feedback and supervision for complex tasks, or assisting human experts through "critic models" to make supervision easier. This ensures that even as AI systems become smarter and more complex, they can be aligned with human values and managed safely.

References:

https://www.axios.com/2025/08/08/openai-aims-gpt-5-at-anthropics-coding-crown

https://x.com/thealexbanks/status/1953867094648385990

https://x.com/slow_developer/status/1954097563981812149

https://x.com/tbpn/status/1954249389796651184

https://www.youtube.com/watch?v=gaImbWPGgtU

This article comes from the WeChat public account "Xinzhiyuan" , author: KingHZ Taozi, and is authorized to be published by 36Kr.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content