OpenAI unveils GPT-5.4, and Harness, the same system used by Codex, is fully available.

04-16

This article is machine translated

Show original

OpenAI has quietly made another shrewd move.

Just now, the Agents SDK underwent a complete architectural rewrite .

Native harness, native sandbox, Codex-level file system tools, plus one-click access from seven leading sandbox vendors.

When GPT-5.4 made a high-profile debut in early March with native computer use, the developers had already complained about something.

The model can operate a computer, but where the agent runs and how to ensure it runs without problems still requires piecing together a framework from scratch.

OpenAI filled that gap itself tonight.

In short, OpenAI has transformed its Agents SDK from a "toy for chatbots" into a "foundation for production-grade agents."

The harness is responsible for control flow, model invocation, tool routing, and pause/resume; the sandbox is responsible for reading and writing files, installing dependencies, and running code, with the two layers completely decoupled.

Even more ruthless is that this blow also hit third-party agent frameworks such as LangChain, CrewAI, and LangGraph.

OpenAI has built the infrastructure layer, leaving visibly less room for third parties.

From "Chatbot Toys" to Production-Grade Bases

Before discussing this upgrade, we need to understand what the original Agents SDK looked like.

In March 2025, OpenAI launched its Agents SDK for the first time, highlighting its lightweight, low-abstraction, and ability to run with just a few lines of Python.

However, this version of the SDK is essentially designed for chatbot scenarios.

More than a year has passed, and the model's capabilities have improved dramatically—it can run for hours, days, or even weeks at a stretch.

The SDK originally designed for chatbots is now outdated.

This rewrite mainly involved two things.

The first thing is to equip the model with a complete operating framework—harness.

Configuration-based memory, sandbox-aware orchestration, Codex-like file system tools, tool calling via MCP, progressive information disclosure via skills, custom instructions via AGENTS.md, code execution via shell tools, and file editing via apply patch tools are all packaged into the SDK for native support.

Developers familiar with Claude Code and Codex will find this list very familiar.

That's right. This time, OpenAI has taken the pitfalls and best practices that its Codex has encountered and accumulated over the past year, and conveniently productized them into the SDK.

The second thing is to completely separate harness and compute.

Harness runs within your trusted infrastructure, managing model calls, approvals, tracking, and runtime status. Compute is a separate sandbox specifically responsible for reading and writing files, running commands, packaging, and outputting artifacts.

With standardized interfaces between the two layers, API keys and sensitive credentials will never enter the environment where the model generation code is actually executed.

As a result, the sandbox contains neither API keys nor any sensitive credentials. The sandbox itself is completely isolated and can even be disconnected from the network, with no outward traffic.

This is not a minor tweak to security features. It's a paradigm shift in the entire Agent architecture.

900-page insurance policy, 100% extracted, half of the PR came from the agent.

The first result of the separation of harness/compute was that the ecosystem map of sandbox suppliers was laid out overnight.

In this release, seven sandbox vendors—Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel—were simultaneously added to the official support list.

The key to enabling seven companies to connect simultaneously lies in OpenAI's provision of an abstraction layer called Manifest—a configuration list describing the Agent's workspace.

This manifest specifies which local files to mount, which cloud storage to pull data from, and where to write the artifacts. It covers AWS S3, Google Cloud Storage, Azure Blob Storage, and Cloudflare R2.

Most importantly, this Manifest is decoupled from specific sandbox suppliers.

Today I wrote an agent using E2B, and tomorrow I want to switch to Modal. No need to rewrite the code, just change one line of configuration. I'll switch to whichever sandbox is cheapest or closest to the data center.

The official documentation provides a minimal example. It runs an agent in a local sandbox, attaches a financial reporting directory, and compares three financial metrics from FY2025 and FY2024. The core code is less than 20 lines.

Two other new capabilities that are particularly critical for long-running missions are snapshots and state recovery that allow sandbox containers to continue from checkpoints even if they fail; and multi-sandbox parallelism plus sub-agent isolation environments that solve scalability issues.

Thus, Agent gained its native capabilities of "recovering from disconnection" and "operating in clones" for the first time.

In a lengthy technical article, Erik Dunteman, a member of the Modal technical team, casually revealed a detail—

Ramp has already built a large army of backend coding agents using Modal, and more than half of the company's PRs are created by these agents themselves.

Furthermore, Stripe also disclosed earlier this year that its internal AI Agent generates more than 1,000 PRs per week.

What the two companies have in common is that after acquiring mature agent infrastructure, their business teams experienced a dramatic leap in productivity.

Today, OpenAI has made these infrastructures, which were once only available to top companies, into the default configurations that are ready to use out of the box in its SDK.

https://modal.com/blog/building-with-modal-and-the-openai-agent-sdk

In response, FurtherAI CTO Sashank Gondala revealed that their agents had successfully extracted over 900 pages of insurance claim records with a 100% success rate.

Over 900 pages, 100%, insurance claims records—these three words combined convey a high level of value that veteran insurance professionals immediately understand. This is one of the most difficult documents to comprehend in the industry; it used to be common for it to crash on a certain page.

Tomoro AI R&D engineer Douglas Adams provided another set of hard figures: for agents with the same capabilities, this time the amount of code required was 6 times less than before.

Carter Rabasa, the Box developer relations manager, provided business data along with bash/python as a tool, allowing the agent to run a complete invoice reconciliation process within the sandbox.

Unexpectedly, the first round of testing went smoothly.

The sandbox is perfect for running the code generated by the agent.

OpenAI is getting involved in infrastructure development, leaving LangChain and its ilk nowhere to hide.

At this level, the true impact of this release on the industry becomes apparent.

How did third-party agent frameworks like LangChain, LangGraph, CrewAI, and AutoGen survive in the past year? The answer is that they filled the gap in OpenAI's native SDK, making it "production-ready".

Orchestration, memory management, guardrails, tracking, and multi-agent collaboration are the main battlegrounds for third-party frameworks.

Now, OpenAI has taken over all these main battlegrounds at once.

What they are doing is building the infrastructure layer of the Agent world. From then on, third-party frameworks will either move to a higher level (orchestration, vertical scenarios) or to a lower level (dedicated sandboxes, dedicated tools). The floor in between has been solidified by OpenAI itself.

Moreover, OpenAI's claim of "compatibility with all sandbox service providers" is itself an attempt to include sandbox providers within OpenAI's ecosystem.

Today they may be OpenAI's partner, but tomorrow they may simply be a "component supplier" within the OpenAI ecosystem.

Python is ahead, TypeScript is still in the queue.

Although all of this is not perfect yet.

The new capabilities of harness and sandbox were initially released only in Python, with the TypeScript version planned for a later update; the SDK is still stuck at version 0.YZ.

But the direction is already very clear.

GPT-5.4 arrives with native computer use, and the Agents SDK provides it with a complete runtime environment.

The next step is simply to have more developers build their business logic onto this infrastructure.

From this point on, startups developing agent frameworks will re-evaluate their positioning. Sandbox vendors will start calculating whether they can handle OpenAI's traffic. Teams developing business-layer agent applications will consider whether to migrate.

On the day GPT-5.4 was released, some people described it as "a routine upgrade without any surprises".

Looking back 40 days from now, the real surprise is only coming today.

References:

https://techcrunch.com/2026/04/15/openai-updates-its-agents-sdk-to-help-enterprises-build-safer-more-capable-agents/

https://modal.com/blog/building-with-modal-and-the-openai-agent-sdk

https://openai.com/index/the-next-evolution-of-the-agents-sdk/

https://x.com/OpenAIDevs/status/2044466699785920937

https://x.com/snsf/status/2044514160034324793

This article is from the WeChat official account "New Zhiyuan" , edited by Haokun, and published with authorization from 36Kr.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content