Two bombs dropped in Silicon Valley overnight: GPT-5.3-Codex targets Claude 4.6. Ultraman is really worried.

This article is machine translated

Show original

Within a single day, two major programming AIs bombarded Silicon Valley. Following Claude Opus 4.6, Ultraman hastily released GPT-5.3-Codex. This battle between the two giants has completely opened the curtain on the struggle for the AI throne.

Silicon Valley will have a sleepless night!

Claude Opus 4.6 launched a surprise attack in the dead of night without warning, catching Ultraman completely off guard.

In response, OpenAI hastily deployed its most powerful intelligent agent programming model— GPT-5.3-Codex —within just half an hour.

There is no GPT-5.3, only GPT-5.3-Codex!

It perfectly combines the top-notch programming capabilities of GPT-5.2-Codex with the superior reasoning and expertise of GPT-5.2, and its running speed is improved by 25% .

It can easily handle long-term tasks that involve in-depth research, tool usage, and complex execution.

GPT-5.3-Codex is like a colleague working alongside you, allowing you to guide and interact with it in real time while it works, without any worry about losing context.

It is worth mentioning that GPT-5.3-Codex is also the first model to play a key role in its own creation process.

With the release of GPT-5.3-Codex, the role of Codex has undergone a qualitative leap:

From an AI agent that could only write and review code, it has evolved into an AI agent capable of doing almost anything that developers and professionals can do on a computer.

GPT-5.3-Codex is now included in the ChatGPT paid plan, covering all Codex application scenarios: App, CLI, IDE extensions, and Web.

Today, Silicon Valley has become a battleground for the two giants, Anthropic and OpenAI, with the air thick with tension.

Interestingly, Ultraman originally announced the release of the new model at midnight, but Anthropic beat them to it and released it first.

Overnight, two of the most powerful programming AIs faced off in a head-to-head battle, prompting netizens to complain, " We simply can't keep up with the speed of AI iteration ."

GPT-5.3-Codex is here, offering enhanced coding capabilities.

Just how powerful is GPT-5.3-Codex? Show us its performance report and you'll know.

Software Engineering New State of the Origin

GPT-5.3-Codex achieved a new industry high in the SWE-Bench Pro benchmark, which evaluates real-world software engineering.

At the same time, in Terminal-Bench 2.0 , which measures the terminal skills of programming intelligent agents, its performance far surpasses the previous state-of-the-art (SOTA).

It's worth noting that GPT-5.3-Codex consumes far fewer tokens to achieve all of this than any previous model.

Compared to SWE-bench Verified, which only tests Python, SWE-Bench Pro covers four languages, making it not only more resistant to data corruption but also more challenging, diverse, and industry-relevant.

Creating a game from scratch

Combining cutting-edge programming capabilities, aesthetic improvements, and compactness, GPT-5.3-Codex produces amazing results, even enabling the construction of highly complex games and applications from scratch in just a few days.

To test the model's web development and long-range agent capabilities, OpenAI had GPT-5.3-Codex create two games:

The Codex App released a second version of a racing game and a diving game.

Using skills developed for web games and pre-selected generic follow-up prompts (such as "fix bugs" or "improve the game"), GPT-5.3-Codex autonomously iterated the game through millions of token interactions.

Racing game: Includes different racers, eight maps, and even power-ups that can be triggered with the spacebar.

Diving game: Players can explore various coral reefs, collect them to complete their fish encyclopedia, and manage oxygen levels.

• Understand your intentions better

Compared to GPT-5.2-Codex, GPT-5.3-Codex is able to understand your intent more accurately when you use it to build everyday websites.

For simple or vague prompts, it now defaults to generating more feature-rich and well-designed websites, providing you with a better starting canvas and helping your ideas come to fruition.

· GPT-5.3-Codex vs GPT-5.2-Codex

For example, you could ask both GPT-5.3-Codex and GPT-5.2-Codex to build the landing page.

GPT-5.3-Codex automatically displays annual plans as discounted monthly prices, making the discounts look clear and well-designed, rather than simply calculating the annual total.

In addition, it created an automatically changing carousel of testimonies featuring three different user quotes, instead of a single, monotonous one. This makes the page look more complete by default, more like a product ready to go live.

GPT-5.3-Codex

GPT-5.2-Codex

Prompt words:

Build a landing page for Quiet KPIs, a founder-friendly summary of weekly metrics. The aesthetic adopts a soft SaaS style with glass-like cards, a lavender-to-blue gradient, and subtle blurring effects. Sections include: a home screen with email collection, a grid of sample report cards, integrated list rows, a customer testimonial carousel, a monthly/annual pricing switch, FAQs, and a footer.

• Use Satoshi or a similar geometric sans-serif font.

• The button has rounded corners with a 14px radius, creating a strong sense of focus.

• Add a tasteful scroll-based display effect.

Beyond programming's general capabilities

Software engineers, designers, product managers, and data scientists do much more than just generate code.

GPT-5.3-Codex provides support for all stages of the software lifecycle, such as debugging, deployment, monitoring, writing PRDs, editing documentation, user research, testing, and metrics.

Moreover, it can help users build anything they want—whether it's creating beautiful slides or performing complex data analysis in spreadsheets.

In GDPval, a measure of expertise work, GPT-5.3-Codex performs exceptionally well, on par with GPT-5.2.

1. Financial Advice Slides

2. Retail training documents

3. NPV Analysis Spreadsheet

4. Fashion Presentation PDF

• Computer skills

OSWorld is a benchmark for computer use that requires agents to perform productivity tasks in a visualized desktop computer environment.

Here, GPT-5.3-Codex demonstrates computer operation capabilities far exceeding those of previous GPT models.

In OSWorld-Verified, the model used vision to complete various computer tasks (human scores were approximately 72%).

In summary, these superior performance results in programming, front-end, computer operation, and real-world tasks demonstrate that GPT-5.3-Codex not only performs better in individual tasks, but also represents a significant step forward towards a single general-purpose intelligent agent .

This means that intelligent agents are now capable of reasoning, constructing, and executing in all aspects of real-world technical work.

Collaborative operations, and the ability to stop midway.

As models become more powerful, the challenge has shifted from "what can intelligent agents do" to "how humans can easily interact with, command, and supervise multiple intelligent agents working in parallel."

With the support of GPT-5.3-Codex, the operation process will be updated more frequently.

In this way, developers can keep track of key decisions and progress at any time while it is working.

You don't have to wait for the final result; instead, you can interact in real time—ask questions, discuss methods, and guide it toward a solution.

GPT-5.3-Codex will explain its operation to you, respond to your feedback, and keep you synchronized from start to finish.

Self-accelerated iteration, taking over the R&D workflow

The current Codex understands your intentions and, more importantly, efficiency.

There's even a kind of "nested" evolution within OpenAI: Codex is accelerating its own creation.

In just two months, OpenAI researchers and engineers discovered that the way they work had been completely revolutionized.

They are using an early version of GPT-5.3-Codex to train, deploy, and optimize the current official version.

The practical results of this wave of "self-evolution" are quite explosive:

Research Team

From monitoring training operations and delving into interaction patterns to developing analysis tools for human colleagues, Codex was involved throughout the entire process, not only fixing bugs but also providing suggestions.

Engineering team

It's the most reliable ally. Whether it's optimizing the testing framework, locating the root cause of cache failures, or dynamically scheduling GPU clusters during traffic surges, it remains stable.

Alpha Testing in Practice

To understand the productivity differences, Codex wrote its own regular expression classifier, ran through massive amounts of logs, and directly generated an accurate report.

Faced with counterintuitive data, it partnered with data scientists to build new pipelines. How many hours would it take a human? Codex extracted key insights from thousands of data points in just three minutes.

More than just a programmer, he's an all-around trader.

GPT-5.3-Codex's ambitions have long since overflowed the code box.

With this release, Codex is transforming from a simple coding tool into a powerful assistant for operating computers and completing tasks end-to-end.

OpenAI is unlocking a broader battlefield—from building software to deep research, complex analysis, and even performing all kinds of desk work.

Once, its goal was to become the "most powerful programmable intelligent agent"; now, it is the omnipotent "universal collaborator" in your computer.

The applicability of Codex has been infinitely expanded, and the ceiling of our creativity will be completely rewritten.

References:

https://openai.com/index/introducing-gpt-5-3-codex/

https://x.com/OpenAI/status/2019474152743223477

https://x.com/sama/status/2019474754529321247

This article is from the WeChat official account "New Zhiyuan" , author: New Zhiyuan, and published with authorization from 36Kr.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content