That's ruthless! Ultraman personally "destroyed" GPT-5.2; OpenAI unleashes its most powerful programming AI.

12-19

This article is machine translated

Show original

GPT-5.2-Codex, a surprise attack in the dead of night!

It is OpenAI's most powerful AI agent programming model to date, designed specifically for complex, real-world software engineering.

As its name suggests, GPT-5.2-Codex is a further optimized version based on GPT-5.2, and it has achieved key improvements in several capabilities:

• Context compression improves the ability to process long-cycle tasks.

• Improved performance during large code changes, such as refactoring and migration.

• Programming capabilities are significantly enhanced in the native Windows environment.

• Strongest cybersecurity capabilities

Altman claimed that "OpenAI and similar technologies" have already been used and have achieved very good results.

In benchmark tests, GPT-5.2-Codex outperformed 5.1-Codex-Max, GPT-5.2, and GPT-5.1 in software engineering and terminal testing.

The OpenAI blog has repeatedly emphasized that GPT-5.2-Codex has achieved the highest level of cybersecurity to date.

Just last week, a security researcher used GPT-5.1-Codex-Max+Codex CLI to uncover a React vulnerability that led to the leakage of source code.

Starting today, all paid users can use GPT-5.2-Codex, and the API will be available in the coming weeks.

GPT-5.2-Codex Programming Frenzy: Long Runs Without Dropping Out

The all-new AI agent programming tool GPT-5.2-Codex is, simply put, a powerful collaboration.

It not only inherits the "professional work processing capabilities" that GPT-5.2 originally excelled at, but also learned the capabilities of 5.1-Codex-Max in AI agent programming and terminal operation.

In this way, its progress becomes very tangible.

Significant improvements have been made in areas such as long context understanding, tool invocation, factual accuracy, and native context compression.

Therefore, GPT-5.2-Codex can reliably support long-running programming tasks and save tokens during inference.

In industry-leading benchmark tests, 5.2-Codex sets a new state-of-the-art (SOTA) record on SWE-Bench Pro and Terminal-Bench 2.0.

Compared to 5.1-Codex, there is an approximately 6% performance improvement.

These two tests are specifically designed to evaluate the agent's capabilities when the model handles diverse tasks in a real-world terminal environment.

At the same time, its performance in agent programming in the native Windows environment has been significantly enhanced, further expanding the capabilities introduced by GPT-5.1-Codex-Max.

Thanks to these improvements, Codex can work for long periods of time in large codebases and always maintain its full context.

This means that GPT-5.2-Codex can reliably complete complex tasks such as large-scale refactoring, code migration, and feature development.

Even if the plan is adjusted or attempts fail along the way, it can continue to iterate without losing its direction.

Moreover, GPT-5.2-Codex has even stronger "vision".

When programming, you can send it screenshots, technical diagrams, charts, and various UI interfaces directly, and it will understand them more accurately.

Even more impressively, it can directly read design drafts and quickly transform them into functional prototypes.

At the same time, developers can also collaborate with Codex to refine these prototypes step by step until they are ready for official release.

Three major leaps: AI has "conquered" the real world

In one of OpenAI's core cybersecurity assessments, a clear "leap in capabilities over time" can be observed.

GPT -5-Codex brought the first significant improvement.

GPT -5.1-Codex-Max brought a second version.

GPT -5.2-Codex achieved the third leap.

OpenAI believes that future AI models will continue to evolve along this trend.

When making plans and capability assessments, they have always assumed that each generation of models has the potential to reach the "high" level of cybersecurity capabilities defined in the "Preparedness Framework".

However, GPT-5.2-Codex has not yet reached this level.

So, how does OpenAI's agent programming model perform in the real world?

High-risk React vulnerability discovered in one week

On December 11, the React team revealed three security vulnerabilities in React Server Components.

Then, Andrew MacPherson, the chief security engineer at Privy, a company under Stripe, decided to use this vulnerability to "test" just how powerful the current AI model really is.

He used GPT-5.1-Codex-Max+Codex CLI, along with other programming agents, and unexpectedly, in the process of reproducing and studying the vulnerability, he uncovered a critical React vulnerability.

The specific practical process is as follows—

Initially, he tried zero-shot learning analysis multiple times, directly letting the model check patches and determine the type of vulnerability they fixed, but without success.

He then turned to a more frequent, iterative approach to prompting; when these methods still failed, he guided Codex to work according to standard defensive security procedures—setting up a local testing environment, analyzing potential attack surfaces, and injecting anomalous input into the system through fuzzing.

In attempting to reproduce the original React2Shell issue, Codex discovered some unusual behaviors that warrant further investigation.

Ultimately, within just one week, this process led to the discovery of the previously unknown vulnerability, which was then responsibly disclosed to the React team.

This case clearly demonstrates how advanced AI systems can significantly accelerate defensive security research in real-world, widely used software.

User test

A developer tested a program written in GPT-5.2-Codex to simulate the operation of vehicles and traffic lights on a road, but it failed.

However, some people believe that it has the same exquisite animation effects as the Gemini 3 Flash and Pro.

GPT-5.2-Codex performed exceptionally well in generating a Counter-Strike game.

In conclusion, OpenAI believes that the release of GPT-5.2-Codex is another major step forward for AI in real-world software development and cybersecurity.

It enables developers to easily handle complex and time-consuming tasks, while also providing better tool support for cybersecurity research.

References:

https://openai.com/index/introducing-gpt-5-2-codex/

https://openai.com/index/gpt-5-2-codex-system-card/

This article is from the WeChat official account "New Zhiyuan" , author: New Zhiyuan, editor: Peach is sleepy, published with authorization from 36Kr.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

Coin68

Bitcoin mining difficulty plummets, the sharpest drop since 2021.

BTC

2.54%

Bitcoin Sistemi

Can Satoshi Nakamoto’s Bitcoin Passwords Be Cracked? Is This Why the Market Is Falling? Analysis Company Reveals the Truth

BTC

2.54%

Bitcoin Sistemi

Watch Out: Massive Token Unlocks Coming in 16 Altcoins Next Week – Here’s the Day-by-Day, Hour-by-Hour List

0.12%