With the release of GPT-5.4, will OpenClaw's capabilities be replaced? OpenAI's new model can now use a computer on its own, and it even boasts full programming capabilities.

This article is machine translated

Show original

Today, GPT-5.4 was released, and the familiar OpenAI is back.

GPT-5.4 is a new cutting-edge model that integrates OpenAI's progress in inference capabilities (GPT-5.2), top-level programming capabilities (GPT-5.3-Codex), and native computer usage capabilities into a single version.

This release is significant. The fact that it offers "native computer operation" is already eye-catching enough. When combined with top-tier professional knowledge capabilities, a context window with 1 million tokens, and significantly improved tool efficiency, it represents a true leap in capabilities for anyone who wants to work with AI, collaborate with AI, or build systems based on AI.

Is GPT-5.4 stealing the entry point from OpenClaw?

The biggest change in this new model is the arrival of native computer operation capabilities. OpenAI stated that GPT-5.4 is its "first general-purpose model with native computer operation capabilities."

On OSWorld Verified computer use benchmark, accuracy improved from 47.3% to 75%, while BrowseComp accuracy improved from 65.8% to 82.7%.

This is not just about "running a few shell commands". Its real significance lies in the fact that it can access your desktop, visit web pages, and basically do many things on your computer that were originally only done by humans, which are usually things that we cannot do through the web version of ChatGPT.

Products like OpenClaw, in particular, have suddenly become incredibly popular in recent months, even weeks, primarily because they have transformed the way we use AI models. Previously, we mostly interacted with models through web apps, with little real involvement on our local computers. But now, this has fundamentally changed.

From the examples provided by OpenAI, we can see that GPT-5.4 can skillfully use a computer, including viewing browser user interface screenshots, clicking on interfaces, sending emails, and scheduling calendar events.

Another new experimental feature, "Playwright (Interactive)," allows Codex to perform real-time visual debugging of web and Electron applications, and even test them directly while building applications—all thanks to its native computer operation capabilities.

OpenAI researcher SQ Mah stated that this is mainly supported by two key capabilities: CUA (computer use) and the ability to generate high-quality websites from image input.

Compared to GPT-5.3 Codex, GPT-5.4 no longer requires launching a completely new environment to perform operations when using CUA. In 3D games, CUA will automatically click on the game interface, move the chess pieces, and even verify whether the rules are correctly applied through actual operations.

In the website generation scenario, the model calls the image gen tool to generate images, and then uses CUA to check its work: it opens the generated images, checks the image content, opens the website page to take a look, and then compares them side by side to ensure that the generated website is as close as possible to the input image.

SQ Mah also emphasized that, through persistent CUA, they found that in some scenarios where models tested their own work, token usage actually decreased by two-thirds.

In fact, OpenAI launched CUA as early as January last year, but due to concerns about security and accuracy, the project has not been taken seriously.

At one point, it even led some to wonder if OpenAI had abandoned this approach. Especially during the period when projects like GPT-4o attracted almost all the attention, CUA was essentially "disappeared."

Have they abandoned this project? There's been absolutely no news about it now. I've actually been using Azure/OpenAI, which has been in preview for several months. Although I applied, I haven't received approval yet.

Compared to the overwhelming publicity surrounding projects like GPT-4o, CUA has essentially disappeared. Moreover, it's still in preview, meaning access is severely restricted, and many people can't even try it... However, I don't believe this approach has failed. Once the "browser-first" solution truly matures in terms of stability, stealth, and built-in security mechanisms, it could very well represent a major leap forward in agent workflows.

However, judging from today's release of GPT-5.4, the situation has clearly changed. OpenAI has not only brought this capability back to the forefront, but has also released some new CUA sample apps on GitHub.

CUA allows ChatGPT 5.4 to directly use our computers, which is very similar to the approach of OpenClaw: essentially, everyone is vying for the same entry point—allowing AI to directly use computers, no longer limited by APIs and chat windows. However, compared to computer-use frameworks like OpenClaw that are built outside the model, GPT-5.4 takes a more direct approach: it natively integrates computer operation capabilities into the model.

When these models begin to "overtake" open-source projects like OpenClaw, companies with annual revenues of tens of millions, hundreds of millions, or even billions can easily create their own versions of OpenClaw—more secure, faster, and more reliable. Therefore, this is truly an exciting stage in terms of Agentic AI capabilities.

Reducing costs on one hand, reducing illusions on the other.

This upgrade is clearly "catering to developers and heavy users," one key reason being that GPT-5.4 introduced tool search: the model no longer stuffs the complete definitions of all tools into the context at once (which could result in burning tens of thousands of extra tokens for each request), but instead only gets a lightweight list, and retrieves the specific definition as needed when a tool is required.

In Scale's MCP Atlas benchmark, with 36 MCP servers enabled and 250 tasks tested, the tool-search configuration reduced total token usage by 47% without compromising accuracy. For developers building large agent systems, this is almost equivalent to lower costs and faster response times.

The problem of hallucinations has also decreased significantly. According to OpenAI, GPT-5.4 is less prone to errors in individual factual statements than GPT-5.2 (error probability reduced by 33%), and the overall probability of errors in responses is also reduced by 18%—a very useful upgrade for professional users who rely on accurate output.

Meanwhile, GPT-5.4 achieved an accuracy rate of 91% in Harvey's BigLaw Bench.

Their programming skills have also improved.

GPT-5.4 has now become OpenAI's main programming model—in most tasks, you no longer need to struggle between ChatGPT and Codex.

It performs on par with or better than GPT-5.3-Codex on SWE-Bench Pro, and is also faster, especially at lower inference intensity settings. Within the dialog, you can start coding immediately without any additional selection.

Codex also added a fast mode, delivering up to 1.5x speedup across all supported models. OpenAI also emphasized that GPT-5.4 is significantly stronger in complex front-end tasks, producing outputs that are both more refined and visually appealing, and more consistent with functional correctness. This has already been confirmed by feedback from many developers.

With the upgrade in capabilities, the price has also increased.

In the API documentation, OpenAI specifies that the model name for GPT-5.4 Thinking is gpt-5.4, while GPT-5.4 Pro is gpt-5.4-pro. Pricing is as follows:

GPT-5.4:

Input: $2.50 / per 1 million tokens

Output: $15 per 1 million tokens

GPT-5.4 Pro:

Input: $30 / per 1 million tokens

Output: $180 per 1 million tokens

Overall, compared to other models currently on the market, GPT-5.4 has a relatively high API operating cost, as shown in the table below.

Another important change is that in GPT-5.4, if the requested input token exceeds 272,000, the fee will be double the normal price, reflecting its support for a larger cue context than previous models.

In Codex, the default compaction limit is 272k tokens. Higher long context prices are only triggered when the input exceeds 272k. This means that developers will not incur additional fees as long as they keep hints within this range; if a longer context is needed, it can be achieved by increasing the compaction limit, but only these larger requests will be charged at a higher rate.

An OpenAI spokesperson also stated that the maximum output length in the API is 128,000 tokens, consistent with previous models.

As for why GPT-5.4 has a higher base price, OpenAI explains there are three main reasons:

Significantly enhanced capabilities in complex tasks, including programming, computer operation, in-depth research, advanced document generation, and tool usage;

A series of research breakthroughs derived from the OpenAI technology roadmap;

It is more efficient at reasoning and requires fewer reasoning tokens to complete the same task.

They also emphasized that even with the price increase, the GPT-5.4 is still priced lower than many cutting-edge models in the same class.

Reference link:

https://openai.com/zh-Hans-CN/index/computer-using-agent/

https://www.reddit.com/r/OpenAI/comments/1mwc03q/openai_computer_user_agent_cua/

https://venturebeat.com/technology/openai-launches-gpt-5-4-with-native-computer-use-mode-financial-plugins-for

This article is from the WeChat official account "InfoQ" , translated by Tina, and published with authorization from 36Kr.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

BeInCrypto Việt Nam

3 altcoins to watch this weekend | March 7-8

BTC

3.63%

BeInCrypto Việt Nam

Dubai orders Kucoin to immediately cease exchange operations.

BlockTempo

21Shares launches the first U.S. spot Polkadot ETF, adding a new member to the Altcoin ETF market.

SOL

3.45%