Embodied intelligence eagerly awaits its "ChatGPT moment".

This article is machine translated
Show original

Fast Reading

  • The proliferation of large-scale models relies almost entirely on new physical infrastructure: computing power is concentrated in the cloud, and terminals are merely the entry point. Embodied intelligence, on the other hand, is entirely different; it is a physical system that integrates hardware, algorithms, environmental perception, and operational systems.
  • While most robots have made great progress, they are still "limited to a single workbench" and struggle to complete continuous and complex tasks across spaces and modalities.
  • For embodied intelligence, the "ChatGPT moment" is more of a borrowed metaphor than a replicable path. If large models prove the explosive power of algorithms, then embodied intelligence tests the endurance of the entire industrial system.

Among the many branches of artificial intelligence, embodied intelligence is one of the most frequently mentioned directions in the past year.

From industrial robots to service robots, from autonomous driving to humanoid robots, each technological route is periodically expected to become a "universal intelligent gateway."

However, unlike the algorithm-driven software revolution, it has always been slowed down by the friction of the real world.

If you only look at publicly available videos, the public narrative of embodied intelligence is almost entirely dominated by the same set of similar images: robots run more steadily, grasp more accurately, move more smoothly, and perform more complex tasks. Funding is accelerating, models are iterating, and embodied intelligence seems to be on a sure-fire upward curve.

Successes are played on repeat, failures are cut out —outside the lab, another narrative exists: deployment costs, stability, and maintenance complexity continue to lengthen the timeline for commercialization.

On February 10, the first technology open day of Force Intelligence was held at the Zhongguancun National Innovation Demonstration Zone Exhibition Center in Beijing.

ForceMachine released three core products: DM0, a native embodied model; Dexbotic 2.0, a native embodied development framework; and DFOL, a native embodied application mass production workflow. This is also the first time that ForceMachine's core team has made a collective public appearance since its founding nearly a year ago.

At the "Physical AI Next" roundtable forum that day, five guests from industry, academia, and research spent about half the time discussing one question:

When will the moment of embodied intelligence in ChatGPT arrive?

The ChatGPT Moment of Embodied Intelligence—this is a composite concept that blends technological breakthroughs, product experience, and business imagination. It refers to both a leap in model capabilities and an expectation: like ChatGPT, it should be quickly understood and used at low cost by non-technical users, and achieve large-scale dissemination.

It carries a certain technological optimism and excitement. After all, the large model quickly moved from the lab to hundreds of millions of users worldwide after ChatGPT's release, completing a clearly identifiable leap.

People naturally wonder whether artificial intelligence will also experience a similar breakthrough when it has a body—an entity that can walk, grasp, and manipulate the physical world.

ChatGPT's success lies in its ability to provide a user experience that is low-cost, highly stable, and reproducible: anyone can open a browser, type in a sentence, and get the output within seconds. This "out-of-the-box" feature has made it a widely used tool.

More importantly, the proliferation of large-scale models relies almost entirely on new physical infrastructure: computing power is concentrated in the cloud, and terminals are merely the entry point. For the industry, this represents a typical "asset-light leap."

Embodied intelligence is entirely different. It is a physical system that integrates hardware, algorithms, environmental perception, and operation and maintenance systems.

Wang Zhongyuan, president of the Beijing Academy of Artificial Intelligence, believes that even with improvements in model capabilities, we are still far from the ChatGPT moment of embodied intelligence. "Especially after the deployment of embodied intelligence models and real hardware devices, we found that there is still a significant gap between this and the large-scale applications we truly hope for."

This gap stems from the inherent uncertainties of the physical world—whether the ground is flat, whether the lighting changes, whether there are minute tolerances in the components, whether the sensors will age… any variable could lead to mission failure.

This is why, at the current stage, embodied intelligence is still in the state of being "demonstrable" rather than "mass-replicable": one success does not equal a systemic success.

More importantly, the same robot can behave drastically at different times and in different places. This means that it cannot provide a uniform and predictable experience for all users like ChatGPT. And the essence of "moment" depends precisely on this collectively perceptible mutation.

Wang Yu, a tenured professor in the Department of Electronic Engineering at Tsinghua University, believes that although most robots have made great progress, they are still "limited to a single workbench" and find it difficult to complete continuous and complex tasks across spaces and modes.

He even proposed a disruptive idea: future residential design might need to incorporate a "robot-adaptive" dimension. In other words, instead of "demanding" that robots adapt to the chaotic living environment of humans, it would be better to let buildings and infrastructure proactively optimize for machines.

Snow Leopard Finance believes that this path is not unfamiliar in industrial history—assembly lines, elevators, and automatic doors all involve first changing the space and then releasing the value of automation. Embodied intelligence may also require a similar "environmental engineering."

Wang Yu's viewpoint also reveals a major difference between large models and embodied intelligence: large models operate in a highly standardized digital world, while embodied intelligence must venture into a physical world designed for humans, not machines. The former is a chessboard with clear rules, while the latter is a noisy wilderness.

How exactly should the ChatGPT moment be defined?

According to Jiang Daxin, founder and CEO of Jieyue Xingchen, a key feature is zero-shot processing. "Zero-shot processing enables generalization. Give it any instruction, even one it has never seen before, and it can answer the question. This is completely different from traditional natural language processing, which is why everyone was so excited about ChatGPT."

Comparing natural language processing and embodied intelligence, Jiang Daxin believes that achieving a "ChatGPT moment" in embodied intelligence will be more challenging. He further explains that the generalization of embodied intelligence involves multiple dimensions such as scenarios, tasks, and goals, and there is a lack of consensus on which dimension to define a "breakthrough."

A breakthrough in technology alone does not necessarily equate to an inflection point in product or industry. This misalignment is precisely why the "ChatGPT moment" is repeatedly discussed in the field of embodied intelligence, yet remains difficult to achieve.

When even the standard for "success" cannot be unified, "moment" naturally becomes a vague rhetoric.

Entrepreneurs truly focused on commercial applications are shifting towards a more pragmatic definition. Tang Wenbin, co-founder and CEO of Yuanli Lingji, envisions ChatGPT's moment of realization as a tool that becomes useful, trustworthy, and quantifiable in terms of return on investment (ROI).

Tang Wenbin frankly admitted, "Although the industry is very hot and flourishing, our overall (embodied) intelligence capabilities are still in the infancy stage."

Gao Jiyang, founder and CEO of Xinghaitu, pointed out from the perspective of the industry chain that the large model is "the model is the product." The terminal of the large language model is mobile phone and computer, and the channel is the dissemination of social media. Once the model is ready, the entire commercialization and industrialization chain is immediately in place. However, the chain of embodied intelligence is extremely long, from the supply chain and complete machine assembly to data closed loop and after-sales service, while the algorithm is actually the link with a shorter dissemination cycle.

This means that the commercialization pace of embodied intelligence also exhibits characteristics of manufacturing: slow capital recovery, high failure costs, and any failure in any link will amplify the overall risk. Breakthroughs in a single technology are unlikely to drive the commercialization of the entire system. "From a business production line perspective, the ChatGPT moment for embodied intelligence was a moment when we truly saw its commercial value within certain limited scopes," said Gao Jiyang.

For embodied intelligence, the "ChatGPT moment" is more like a borrowed metaphor than a path that can be replicated.

The real watershed moment for embodied intelligence may not be a technological miracle that attracts nationwide attention, but rather the day when it quietly becomes an indispensable but undiscussed presence in factories, warehouses, and industrial parks.

But before that final stage arrives, its maturation will be more like the evolution of infrastructure—slow, silent, yet indispensable.

If large-scale models demonstrate the explosive power of algorithms, then embodied intelligence tests the endurance of the entire industrial system.

In this marathon without any "miracles," the winner may not be the one with the coolest algorithm, but rather the one who understands the supply chain best, can run a closed loop on a real device the best, and is most willing to immerse themselves in specific scenarios.

This article is from the WeChat official account "Snow Leopard Finance" (ID: xuebaocaijingshe) , author: Cao Quanjing, editor: Huang Yuntao, published with authorization from 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments