GPT-5.4 lobster farming too expensive? OpenAI steps in and cuts it to 90% off.

03-19

This article is machine translated

Show original

OpenClaw has recently become incredibly popular across China. Major vendors are all claiming their models support crayfish, but no one seems to recommend ChatGPT.

In fact, OpenAI just acquired OpenClaw.

The reason is simply that it's "expensive".

Let AI help you complete a slightly complex task, such as automating the processing of a batch of customer emails. Behind the scenes, the model may be called dozens or even hundreds of times: understanding intent, retrieving information, generating drafts, proofreading and polishing, sending emails one by one... If every step calls the full-fledged version of GPT-5.4, the cost of feed (tokens) for one round of operation is more expensive than the shrimp themselves.

Especially with the emergence of agent frameworks like OpenClaw, the working mode of AI has undergone a fundamental change: In the past, when we asked AI a question, it would stare at the question for a long time and then utter a long string of text; now, when faced with a complex task, AI needs to break it down into tiny steps and proceed step by step. Each step calls a flagship large model, which not only results in frustrating latency but also ridiculously high costs.

Against this backdrop, OpenAI officially released two small models, GPT-5.4 mini and nano, claiming them to be the company's "most powerful small models to date."

Although small, these two gadgets are complete. Don't underestimate them. They not only inherit the advantages and capabilities of the GPT-5.4 core, but are also faster, more resource-efficient, and suitable for large-scale, high-frequency AI task calls.

OpenAI apparently felt that the mini wasn't small enough, so they created the even lighter nano.

nano is the lightest and fastest version of GPT-5.4, designed for tasks with extremely high speed and cost requirements.

The fact is that using a single model to handle all tasks is too inefficient and often results in using a sledgehammer to crack a nut. It is better to adjust to a solution where a large model determines the direction of the task, and a small model performs large-scale and rapid execution.

OpenAI's own Codex does just that.

A main model is responsible for understanding the task intent, breaking down the steps, and then scheduling mini/nano-level sub-agents to perform specific code modifications, test runs, and result verification. Each sub-task consumes very low costs.

The large model is like a commander-in-chief who sits in the center of the army, strategizing and directing all resources. The small models are like countless elite light cavalry units, agile, swift, and deployed in large numbers to the front lines, dedicated to completing specific tasks.

Why OpenAI did this

The mini and nano models are indeed "mini" and "nano" in price. To understand why OpenAI is betting on lightweight design, let's first look at just how cheap these two models are.

Both the mini and nano versions support context windows of 400,000 tokens. Regarding input costs, the GPT-5.4 flagship version is $2.5 per million tokens, the mini version is $0.75 per million tokens, and the nano version is even more impressive at only $0.2 per million tokens, making the input cost only 8% of the flagship GPT-5.4 model.

In terms of output price, GPT-5.4 is $15 per million tokens, the mini version is about 1/3 of that ($4.50), and the nano version is about 1/12 of that ($1.25).

In other words, OpenAI users' bills have been slashed to almost nothing.

Price is just a superficial factor; what truly drives OpenAI to do this is a fundamental shift in the usage trends across the entire industry.

Lightweight small models are characterized by low calling costs and fast response times. There is considerable evidence that small models have become the most cost-effective and growth-potential option for AI implementation.

In OpenRouter’s top ten most popular LLMs this month, lightweight models occupy six spots. Their parameter counts are generally in the range of billions to tens of billions, which is in stark contrast to general flagship models like Claude Opus, which have hundreds of billions or even trillions of parameters.

The top two spots on the list were occupied by lightweight small models. MiniMax M2.5 ranked first with a call volume of 8.29T tokens, leading the entire list by a wide margin, with a monthly increase of 476%; Google Gemini 3 Flash Preview ranked second, with a call volume of 4.24T tokens, far exceeding most general flagship large models.

Hugging Face Hub's model download statistics also confirm this trend: 92.48% of downloads came from models with fewer than 1 billion parameters, 86.33% from models with fewer than 500 million parameters, and 69.83% from models with fewer than 200 million parameters.

Large-scale production has also entered an era of low profit margins and high sales volume.

Models with 1B+ parameters, even including several highly popular open-source models, only accounted for 7.52% of total downloads, less than one-tenth of lightweight models. This suggests that the high level of attention given to large models has not translated into actual implementation and adoption.

From OpenAI's own business perspective, creating small models is an essential task.

In late February of this year, OpenAI announced that ChatGPT had surpassed 900 million weekly active users globally, with approximately 50 million paying users. The paid conversion rate was only about 5%, meaning the vast majority of users remained on the free version. This represents the core growth potential for its future commercialization.

The vast majority of C-end paying users primarily use the service for high-frequency, lightweight needs such as daily conversations, copywriting, information retrieval, and lightweight code writing.

These scenarios do not require the extreme complexity of inference capabilities of flagship large models like GPT-5.4. Lightweight small models with a capacity of less than 10 billion are sufficient to cover most needs, while providing millisecond-level response and a user experience without queuing, perfectly matching the core demands of the vast majority of users.

Having said so many "whys," let's see what kind of results these two models actually deliver—after all, if feed becomes cheaper but the size of the shrimp also shrinks, that's not called cost reduction and efficiency improvement, that's called cutting corners.

What are the capabilities of the mini and nano models?

Are the advantages of mini and nano just that they are small and cheap?

No No No

According to a series of benchmark tests on the OpenAI website, their performance is quite outstanding.

In SWE-bench Pro, the most authoritative AI programmer test in the industry, the GPT-5.4 mini achieved an accuracy of 54.4%, which is extremely impressive and close to the 57.7% accuracy of the full-fledged GPT-5.4.

With an accuracy of 52.4%, and considering its extremely low cost, GPT-5.4 nano is ideal as a rapid-iteration code review and auxiliary sub-agent.

The following two charts provide a more intuitive understanding. The horizontal axis represents the model's response time and cost, respectively, while the vertical axis represents the model's accuracy in the task.

While GPT-5.4 consistently ranks first in accuracy, its graph extends too far on the horizontal axis, meaning it not only takes longer to process data but also costs more. In contrast, the lines for the nano and mini models are generally located on the left side of the graph, indicating their extremely high cost-effectiveness.

They sacrificed only a tiny bit of the ultimate logical limit in exchange for extremely fast response speed and extremely low cost.

In response, many netizens joked: "The cost of crayfish feed has finally been reduced."

Indeed, mini and nano models may become the mainstream API choices for shrimp farming in the future.

In OSWorld-Verified (real-world computer environment operation test), the GPT-5.4 mini achieved an accuracy rate of 72.1%, almost matching the 75% accuracy of the full-fledged flagship version.

The main purpose of this test is to enable the AI to use a real computer like a human, by looking at the monitor, moving the mouse, and typing on the keyboard, including using software such as Chrome, Office, and VS Code.

This is the metric that OpenClaw and other agent players value most.

In the past, when AI controlled computers, it often made random clicks or reacted slowly. This high score in the mini version means that it has extremely high accuracy in recognizing buttons, sliders, and input boxes, making it more adept at automating tasks.

However, small models are not suitable for all scenarios.

The nano model scored only 39.0% on OSWorld-Verified, even lower than the previous generation GPT-5 mini's 42.0%.

This means that nano still falls short in complex tasks that require precise manipulation of the computer interface.

Similarly, for highly complex tasks requiring deep reasoning and long chains of logic, the flagship version of GPT-5.4 remains irreplaceable.

The value of a small model lies not in replacing a large model, but in using it in conjunction with a large model—putting the right model in the right place is the true essence of sub-agent architecture.

This is precisely the deeper meaning behind the release of the nano and mini. They are not here to steal the flagship version's thunder, but to help the flagship version share the burden of "using a cannon to kill a mosquito".

When the large model no longer needs to handle every trivial step personally, the efficiency and cost structure of the entire system will undergo a qualitative change.

OpenAI's intention is not a simple price war. OpenAI's inner thoughts are: "I can earn less from you per token, but I want you to use my small models more and increase the total revenue."

A typical example of low profit margins and high sales volume.

In the past, "cheapness" was the moat protecting domestically produced models, but this moat is being eroded. For ordinary developers and enterprise users, AI may soon become a new, affordable, and readily available infrastructure across various industries.

With the cost of crayfish feed decreasing, the barrier to entry for crayfish farming is quietly lowering. The next question is: who can raise the fattest crayfish?

This article is from the WeChat public account "Alphabet AI" , author: Liu Yijun, and published with authorization from 36Kr.

Sector:

Layer 1

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content