He just raised 2.7 billion yuan, and Li Feifei also invested.

This article is machine translated
Show original

In today's venture capital market, "world model" is undoubtedly the hottest buzzword. We see new "world model" companies completing funding rounds almost daily, with valuations skyrocketing and impressive shareholder lists. And in these funding announcements, one fact is repeatedly emphasized: a qualified super-intelligent agent shouldn't acquire capabilities solely through data input, but rather should proactively understand the physical world like a human.

But Pete Florence wrote a long open letter after starting the company, beginning with: "Don't label my company as a world model."

This is truly a reversal of roles. Because Pete Florence is far more than just an "entrepreneur." Before starting his own business, he worked at Google's DeepMind team, rising through the ranks from researcher to senior research scientist. He was one of the core developers of Gemini Robotics, the robot control model released by DeepMind in 2025. However, his most influential achievement during this period was in 2023, when he and his colleagues introduced a completely new robot model architecture , "Vision-Language-Action Models," to the world.

Generalist AI

(Peter Florence, Source: Social Media)

Yes, that's right. If the "world model" or "VLA" is currently the most cutting-edge and widely accepted direction, then Peter Florence is undoubtedly a pioneer on this path. It's truly shocking that someone like him is leading the way in discarding the label of "world model."

Now, the impact is doubled. Recently, Generalist AI, the embodied intelligence company founded by Peter Florence, completed a new round of financing, raising a total of $400 million (approximately RMB 2.7 billion) and valuing the company at $2 billion (approximately RMB 13.55 billion). Investors in this round include NVIDIA's NVentures, NFDG (managed by renowned angel investors Nat Friedman and Daniel Gross), Bezos Expeditions (Bezos' family office), Xiaomi co-founder Lin Bin, Zoom founder Eric Yuan, and Fei-Fei Li, one of the world's most representative scientists in the field of modeling.

"Goals" are more important than "labels".

Why did Peter Florence, one of the main founders of the world model, so resist being labeled as a "world model"? Why did Fei-Fei Li, one of the most representative scholars in the field of world models, support such an openly "heretical" heretic with real money? The story may begin in 2019.

At that time, Pete Florence was pursuing a PhD in Computer Science at MIT, focusing on areas such as robot manipulation, computer vision, and natural language processing. From this perspective, Pete Florence was a "well-trained" individual with a conventional research direction and academic background; he wasn't someone who needed to rely on "unconventionality" to gain resources. However, the problem was that MIT assigned him a mentor named Russ Tedrake.

Who is Lars Tedrek? First and foremost, he is undoubtedly an academic heavyweight. In 2019, he served as Professor of Electrical Engineering and Computer Science at MIT, and Director of the Robotics Center at the Computer Science and Artificial Intelligence Laboratory. Every year, he leads the MIT team in the prestigious DARPA Robotics Challenge. Outside of academia, he also serves as Vice President of the Robotics Research Center at Toyota Research Institute. It's fair to say that Lars Tedrek is one of the most outstanding scholars in the field of robotics, with ample resources to help the young Pete Florence realize his academic dreams.

However, in Lars Tedrek's own perception, what fascinated him was not programming code, but "physics." In a self-introduction, Lars Tedrek recalled that his academic path in computer science stemmed from his research on "bipedal robots," where he observed "rich dynamic characteristics," which sparked his strong interest in "complex fluid dynamics control." Therefore, unlike other researchers who, upon entering the field, first studied how to make robots grab apples or fold blankets, his initial research focused on how to control "aircraft or flapping-wing aircraft after stalling" and how to "cross dense obstacles at high speed."

This background dictates that Lars Tedrek places great emphasis on "understanding the physical world." MIT's website describes Lars Tedrek's academic characteristics as follows: "The professor's research focuses on finding elegant control solutions for interesting (underactuated, stochastic, and/or difficult-to-model) dynamical systems and on building these systems for experimental verification. He is particularly interested in the connection between mechanics (especially nonsmooth mechanics) and machine learning/optimization theory to achieve robust control design for complex mechanical systems."

Growing up surrounded by computer science, Peter Florence naturally became a "physicals-oriented" figure in the field. For example, his most representative academic achievement during his doctoral studies was a paper titled "Self-Supervised Correspondence in Visual Motion Policy Learning." This paper proposed a method that, through imitation learning, allows robots to complete challenging maneuvering tasks in just 50 demonstrations, and also enables them to generalize to different categories of objects and adapt to the configurations of deformable objects. This paper won the 2020 IEEE (Institute of Electrical and Electronics Engineers) Best Paper Award in Robotics and Automation.

Of course, which "school of thought" he belonged to is not important; what is important is that, influenced by this environment, Pete Florence developed a completely different way of thinking. Many researchers are accustomed to using existing technologies, then conducting experiments to determine the feasibility of the technology, and finally determining the application scenarios. Pete Florence, however, believed that the correct order should be "first set specific goals," and then design the technological path.

After joining Google's DeepMind team, Pete Florence began his work in this direction, with his first major achievement being the Transporter Network, Google's first-generation robot model architecture, launched in 2021. In the paper announcing the model, Florence stated that organizing items should be a very basic skill, but for a robot, completing this action involves "high-level and low-level perceptual reasoning," requiring consideration of where books should be placed and in what order, while also ensuring that the edges of the books are aligned to form a neat stack.

Transporter Network is a model architecture designed to "make simple actions simple," enabling robots to perform various operations based on vision in a general way. It has a fast training speed and is less dependent on the training environment.

The release of the VLA architecture in 2023 in collaboration with the DeepMind team was a natural progression based on this idea. In the paper that ushered in the current golden age of modeling, the authors stated that they hoped the VLA architecture could "significantly improve the ability to generalize to new objects, interpret instructions not present in the robot's training data (such as placing objects on specific numbers or icons), and perform basic reasoning based on user instructions (such as picking up the smallest or largest object, or picking up the object closest to other objects)."

Returning to the initial question, why did Peter Florence, one of the main founders of the world model, so resist being labeled a "world model"? The answer is the same: Peter Florence believed that "goals" were more important than "labels."

In his view, the current enthusiasm for world models is actually "idea-driven." For example, a significant portion of this enthusiasm can be attributed to the excitement of the capital market discovering non-consensus in hot sectors. Furthermore, if we truly want to drive robots into our work and lives and create productivity, then building a "world model" is clearly not the goal. The real goal should be for robots to complete unprecedented tasks with extremely high success rates and speeds, completely without any specific task data.

This is precisely why Pete Florence decided to leave Google DeepMind and start his own business. At the 2025 NVIDIA GTC conference, Pete Florence first appeared in the public eye as the co-founder and CEO of Generalist AI. He said, "We are determined to build robots that can do anything... Imagine what it would be like if the marginal cost of manual labor dropped to zero."

99% success rate

Besides his unconventional technological ideas, Pete Florence's entrepreneurial path also appears quite unorthodox.

In theory, entrepreneurs with such a resume would undoubtedly be highly sought after by VCs today. Yang Likun, Ilya Sutskevich, and Mila Mulatti are examples; all of them completed seed rounds exceeding $1 billion almost immediately after their companies were registered (or even before). However, Pete Florence's Generalist AI only received investment from a handful of institutions in its early stages, including Nvidia, Bezos's office, and NFDG. If it weren't for Nvidia's venture capital arm, NVentures, organizing a "Investee Companies Roundtable" at the 2025 GTC conference, no one would have known that he had already left to start his own business.

Why is this the case? The most likely answer is that it was Pete Florence's deliberate choice. As mentioned above, Pete Florence joined Google's DeepMind team immediately after graduation and has worked there from 2019 to 2025, without any other work experience in between. In other words, Generalist AI is his first entrepreneurial experience, and it was entirely necessary to be extremely cautious about it.

In fact, at NVIDIA's GTC conference in 2025, where he made his first public appearance as an entrepreneur, Pete Florence clearly demonstrated his "caution." Aside from telling everyone that he was building "robots," he didn't reveal any specific business direction, stating directly, "We are still in secrecy."

It wasn't until November 2025 that people first saw the specifics of Generalist AI's business. In November 2025, Generalist AI released their first-generation embodied intelligence model, GEN-0. In its official introduction, Generalist AI stated that GEN-0 combines the advantages of visual and language models, and simultaneously surpasses them— Gen-0 can capture human-level reflexes and common sense about physics.

In short, it can continuously improve its capabilities as the model size and training data increase, breaking through the bottleneck of previous small models; it can think and act simultaneously like a human, making rapid and natural responses in real physical environments; it is naturally adapted to different types of robots without additional modifications; more importantly, it relies on massive amounts of real-world operational data, no longer constrained by data scarcity, and can flexibly adjust the composition of training data. Numerous tech media outlets have pointed out that GEN-0 proves that the mathematical "extension laws" driving large language models such as ChatGPT also apply to physical motion.

However, GEN-0 is not perfect. For example, GEN-0 did not solve the dataset problem that plagues the field of embodied intelligence. Therefore, in April 2026, Generalist AI quickly iterated to the new version GEN-1.

Generalist AI

(“Robotic arm”, source: Generalist AI social media)

To address the dataset challenge, Generalist AI developed a wearable device to capture minute movements and visual information from humans performing manual tasks. Generalist AI stated that during the development of GEN-1, they collected over 500,000 hours of "petaflop-level physical interaction data" using these robotic arms to train its physical model. After thorough training, Generalist AI claims that GEN-1 achieves a 99% success rate in repetitive but delicate mechanical tasks such as folding cardboard boxes, packing phones, and maintaining robotic vacuum cleaners, at approximately three times the speed of its predecessor, GEN-0, and in just about an hour.

Therefore, Generalist AI proudly announces that the physical model of GEN-1 is approaching an inflection point similar to GPT-3, with the performance of some tasks beginning to "reach the level required for deployment in commercially viable environments," and "we can expect each new generation of models to bring a series of increasingly complex new tasks that can be mastered."

In his official blog post, Peter Florence pointed out that the development process of GEN-1 best exemplifies his personal technological philosophy: First, he set a rational goal: for the robot to complete a wide variety of previously unseen tasks with extremely high success rates and speeds, completely without any task-specific data. Next, based on this goal, he established a solution path that allows for the use of a small amount of robot data (referred to as X) for specific tasks, achieving a high level of execution for those tasks, and then continuously reducing X while simultaneously improving performance.

At this point, the question we raised earlier has been answered. Whether Generalist AI's product is actually called a "world model" is no longer important. As long as you see potential in the embodied intelligence industry and believe that robots can be widely adopted in actual production, then Generalist AI is indeed a worthwhile investment. And this round of financing for Generalist AI was indeed finalized quickly within two months of the GEN-1 launch.

According to reports, existing shareholders Nvidia, Bezos Expeditions, and NDFG have all opted to reinvest, and even doubled their investments. In addition, new investors include Xiaomi co-founder Lin Bin, Zoom founder Eric Yuan, Chinese scientist Fei-Fei Li, as well as institutional investors such as Radical Ventures, 8VC, Union Square Ventures, Hanabi Capital, and Norwest.

In other words, by June 2026, Pete Florence no longer needs to prove himself. At the very least, the boasts he made over the years—such as when Pete Florence, who had just started his business in 2025, said in a podcast, "General-purpose robots are not about dabbling in everything, but about being professional enough to be useful enough in real-world tasks"—are already on the path of "delivering on his promises."

This article is from the WeChat official account "Touzhong.com", author: Pu Fan.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments