Apple's desired edge AI has produced a dark horse: the first cognitive model has been born, with 4B matching GPT-5.4.

06-09

This article is machine translated

Show original

[Introduction] At the recent WWDC, Apple's Siri was highlighted by the keyword "AI-driven regeneration," highlighting the growing trend of "edge-side models." Earlier, Andrej Karpathy advocated for stripping away the knowledge from models and retaining only the "cognitive core." A Chinese company claims to have implemented this approach—using 4B parameters—to achieve the performance of a massive model with hundreds of billions of parameters in swarm intelligence tasks. What exactly can edge-side cognitive models change?

Last night, Siri was reborn using Google's Gemini, which has 1.2 trillion parameters.

On the other hand, Amazon shut down its internal AI leaderboard, which had sparked huge controversy—employees were using AI tools extensively, and computing costs had skyrocketed to the point that management couldn't sit still.

Token costs have become the toughest hurdle for the large-scale deployment of AI.

In a previous interview, Andrej Karpathy suggested a direction: strip away the massive amount of knowledge in the model and retain only a "cognitive core" that can think, plan, and know what it doesn't know; 1B level parameters would be enough.

https://www.youtube.com/watch?v=lXUZvyajciY

This direction is being validated.

A 4B parameter model achieved results equivalent to large-scale models with hundreds of billions of parameters, such as GPT-5.4, in swarm intelligence tasks, and supports edge deployment.

It comes from a founding team that once topped the Japanese Hugging Face rankings with a 3.6B parameter, beating the 65B Llama.

This time, they created the industry's first edge-side cognitive model.

Karpathy's prophecies and computing power bills

The pressure of computing power costs has shifted from a technical issue to a financial one, and Amazon's case is just one example.

Amazon employees frequently used internal AI tools to access the inference capabilities of large models, driving up overall computing power expenditures. Management had to urgently halt the leaderboard mechanism to curb usage.

https://www.ft.com/content/b1a62a7f-6df5-4c90-94ce-64ce9c9961b6?syn-25a6b1a6=1

The industry is experiencing its first "token retreat," with some companies consuming hundreds of millions of yuan in computing power per day.

Large-scale commercial models are hitting a structural wall: the stronger the capabilities and the deeper the reasoning chain, the higher the cost of a single call.

GPU cost-to-revenue ratio is a critical metric for all AI companies, and the ever-expanding trend of model parameters will only make this metric look worse.

Karpathy's approach points to another path: he proposes stripping away the "memory/knowledge" from the model and retaining what he calls the "cognitive core"—

An entity stripped of massive amounts of facts and knowledge, but retaining its thinking algorithms, intelligent magic, and problem-solving strategies.

He concluded that even with a scale of 1 billion parameters, efficient human-like thinking could be achieved:

It can think like a human... If you ask it a factual question, it may need to do research—it knows it doesn't know, and it will look it up.

This statement sparked widespread discussion in the tech community.

A consensus is forming on the direction, but the real variable is the team that can move the "core understanding" from a concept to a deployable product.

4B achieves parity with companies in the hundreds of billions; what did NewCheng Alpha do?

Nextie is the company that took Karpathy's "cognitive core" from concept to product.

This company trains open-source reasoning models using reinforcement learning, decoupling knowledge from cognition—removing the memorized knowledge reserves from the model and enhancing generalization and abstract thinking abilities.

The resulting model, named NewChengAlpha , has 4 bytes of parameters. It has been trained and deployed, and is the first product in the industry to be defined as a "cognitive model".

In terms of its specific training methods, it is actually an uncommon starting point.

The Tomorrow's New Journey team compiled human academic papers spanning 220 years, from 1800 to 2020, in an attempt to trace the evolution of swarm intelligence and provide a reference for technological routes.

Based on this research, reinforcement learning is applied to the open-source inference model, focusing on improving its generalization and abstraction capabilities.

To give a vivid example: the trained model can transfer the decision-making patterns of Go players to everyday life scenarios—Karpathy's "thinking-preserving algorithm" has a concrete technical implementation here.

In terms of performance, the NewCheng Alpha achieved output quality equivalent to large models such as GPT-5.4 in swarm intelligence tasks (debate, reflection, challenge, voting, etc.), with significant advantages in computing power consumption and inference speed.

What's even more noteworthy is the scene space unlocked by this model, which has three progressive layers of meaning.

The first layer focuses on improving the quality of multi-agent decision-making.

In the Harness decision-making framework, the output of the cognitive model outperforms that of the reasoning model.

The upgrade of the underlying model from "reasoning" to "cognition" brings about a leap in the overall quality of the decision-making chain in multi-agent collaborative systems.

The second layer reduces computing power costs by a significant margin.

Compared to models with hundreds of billions of parameters, cloud-based deployments significantly reduce computing power costs.

The new Alpha also supports edge deployment—MacBooks and smart devices can run it directly, thus converting computing costs into electricity costs.

This is particularly significant for the field of embodied intelligence: using a large model with hundreds of billions of parameters to drive a household robot consumes a large number of tokens every time it "thinks," and the overall cost may be more expensive than hiring a human to do housework.

4B edge deployment fundamentally rewrites this account.

The third layer is proactive scene unlocking.

Currently, the vast majority of AI products operate in a reactive mode—the user issues commands, and the model responds.

Proactive mode means that intelligent agents make autonomous decisions and execute tasks without waiting for commands, and its commercial scale far exceeds that of Reactive mode, but it has always been kept out by the cost of computing power.

The new Alpha supports 24/7 uninterrupted operation at a controllable cost, making proactive intelligent agents, which were previously shelved due to their high cost, possible.

Team trump cards and track positioning

Tomorrow's New Journey was founded by the Microsoft Xiaoice founding team.

The team's motto is "winning with small parameters against large parameters"—the previously trained open-source model Rinna (Japanese Xiaoice) topped the Japanese Hugging Face leaderboard with 3.6B parameters, defeating Llama with 65B parameters.

The new Alpha uses 4B to achieve the same level of performance as a large-scale model with hundreds of billions of users, continuing the same set of technological genes.

Tomorrow New Journey's key investment area is Harness swarm multi-agent technology.

This field is gaining recognition from leading investors – in March 2026, OpenAI invested in the startup Isara, directly pushing its valuation to $650 million. Isara's research focuses on multi-agent collaboration and swarm intelligence.

https://www.wsj.com/tech/ai/openai-backs-new-ai-startup-seeking-bot-army-breakthroughs-a0b1fedc

In the Intelligent Depth Evaluation (IDI) of this field, Tomorrow's New Journey's overall performance is significantly higher than any single large model.

Capital validated the value of the track, while evaluation data determined Tomorrow's New Journey's position within the track.

The two signals combined point to the same conclusion: multi-agent swarms are the next high-value direction for AI applications, and cognitive models are the key infrastructure driving it.

Cognitive models change not just the parameters, but also the ledger.

The GPU cost-to-revenue ratio is a Damocles' sword hanging over the heads of all AI companies.

The solution provided by the cognitive model points to the reconstruction of the economic model—achieving the effect that can only be achieved at the hundreds of billions level with 4B means that the same output quality corresponds to a completely different cost structure.

In an interview, Tomorrow New Journey revealed that the team is training an 8B cognitive model with stronger generalization capabilities.

If 4B can already rival GPT-5.4 in swarm intelligence tasks, the capability boundaries of 8B are worth looking forward to.

A more profound question remains for the entire industry: when the cost of running a cognitive model on the edge 24/7 drops to a negligible level, all AI products designed today based on the "user issues commands, model responds" reactive model may need to re-examine their product form.

The commercial potential of proactive intelligent agents far exceeds that of current reactive intelligent agents.

This article is from the WeChat public account "New Zhiyuan" , author: ASI Revelation, and published with authorization from 36Kr.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content