Dialogue with a16z: LLM is lossy compression, and the world model is the real direction

This article is machine translated
Show original

World Labs is a startup founded by renowned AI expert and Stanford University Professor Fei-Fei Li in 2024, dedicated to developing the next-generation AI system with "spatial intelligence".

Since its establishment, World Labs has completed two rounds of financing, raising a total of approximately $230 million. The main investors include a16z, Radical Ventures, NEA, NVIDIA NVentures, AMD Ventures, and Intel Capital. The company's valuation has exceeded $1 billion in just three months, becoming a new unicorn in the AI field.

Recently, Fei-Fei Li engaged in a dialogue with two a16z partners, Martin Casado and Eric Torenberg, where she publicly shared the conceptual framework, research direction, and grand vision behind their joint founding of World Labs, tracing the a16z platform strategy from "unwilling to clean up" to "full-stack service".

Fei-Fei Li immediately pointed out the core viewpoint of this conversation: "I don't need a large language model to convince me; the world model is the truly important direction."

She emphasized that spatial intelligence - whether in the three-dimensional physical world we live in or the imagined digital universe - is an indispensable component of intelligence. Today, we finally have the ability to generate and reconstruct these universes.

Intelligence Older Than Language: Spatial Perception and 3D Reconstruction

Fei-Fei Li noted that compared to language, spatial perception is a more ancient and instinctive ability in human evolution. She shared a personal experience: years ago, due to corneal injury, she temporarily lost stereoscopic vision, and during that time, she was afraid to drive alone, unable to judge the distance from other cars even on familiar streets.

This experimental experience made her deeply recognize the fundamental role of three-dimensional perception systems in human actions. For AI, without establishing a three-dimensional world model, it cannot truly understand, manipulate, or reconstruct the real world.

Martin Casado added that this lack of three-dimensional intelligence is the key reason why robots and embodied intelligent systems have been difficult to implement. He explained with a common example: if you bring someone into an unfamiliar room, blindfold them, and try to describe the space only through language, asking them to complete a task - it would be almost impossible. But once they open their eyes, the brain can automatically reconstruct the spatial model and complete the action. This reconstruction ability is completely absent in current mainstream language models.

From NeRF to the Technical Tipping Point of World Models

Discussing why they chose to found World Labs at this time, Fei-Fei Li believes it is the result of long-term academic research and industrial foundation accumulation.

She recalled that four years ago, a research breakthrough called NeRF (Neural Radiance Fields) had already opened up a new path for three-dimensional visual modeling. The proposer of NeRF, Ben Mildenhall, is now one of the co-founders of World Labs.

Another founder, Christopher, conducted pioneering research in efficient three-dimensional representation, promoting the return of volumetric 3D modeling in the industrial world.

Adding Justin Johnson, who early applied GAN technology to image style transfer, these scattered research achievements are now integrated into the same team, revolving around a "North Star" goal: building AI's world model capabilities.

Martin summarized this goal as the deep integration of two systems: first, the AI model, data, and architecture itself, and second, the engineering system of graphics rendering and spatial reconstruction. Being able to let experts from these two worlds collaborate efficiently on one platform is itself an important organizational innovation in the technology industry.

Language Models Are Not the Endpoint, But the Prelude

Fei-Fei Li emphasized that her belief in world models does not stem from disappointment with LLM, but from a deeper understanding of the essence of intelligence.

She pointed out that language is a "lossy compressed" way of cognition, which abstracts the world while losing rich physical and perceptual information. The real world has no words, grammar, or text, only physics, motion, and three-dimensional structure.

This view also changed her perception of what an AI company should look like. Transitioning from Stanford professor to entrepreneur, she realized that modeling spatial intelligence requires more than academic research - it needs industrialized computing power, system-level architectural scheduling, and top cross-disciplinary talent collaboration.

And all this can only truly be implemented in a highly organized company with outstanding full-stack engineering collaborative capabilities.

Spatial Intelligence Applications Far Beyond Robotics

For most people, "world models" remain an abstract scientific term. But Fei-Fei Li and Martin jointly pointed out that its applications far exceed autonomous driving and robotics.

Creativity is essentially visual. Industrial design, film production, architectural composition, and even game development all depend on three-dimensional construction and manipulation. If AI possesses world model capabilities, it can not only "understand" the three-dimensional world but also "generate" and "manipulate" virtual spaces.

Martin described that with just a photo of a table, the model could infer its form and material, thereby constructing a complete spatial scene. On this basis, users could even measure, add, delete, or redesign the space. This is a more intuitive and free human-machine interaction method that opens up new dimensions for design, creation, and simulation experiments.

Fei-Fei Li further proposed that digital spaces are bringing an unprecedented transformation opportunity: "Humans have so far only lived in a three-dimensional physical world. But the digital world will, for the first time, allow us to enter 'multiple universes'."

She listed several examples: some universes designed for robots, some serving human creativity, some for storytelling, communication, and travel experiences. These spaces, once existing only in imagination, will now be truly generated and understood, used, and transformed by machines.

The Next Battle of Foundation Models: Full Panoramic 3D Modeling

Returning to the technology itself, Fei-Fei Li emphasized that World Labs is not just about creating an AI that "can see", but about enabling AI to understand the world's three-dimensional structure, dynamics, and combinatorial logic. This is not just a more difficult engineering problem, but a completely new philosophical representation.

She believes that scientific discoveries like DNA's double helix structure and Buckyballs are crystallizations of spatial intelligence. Pure language cannot derive such geometric constructions. This is why world models can not only enhance machine understanding but also potentially open new creative paths for human science and art.

Martin summarized that the revolution brought by LLM proves a fact: when we find the right data structure and model representation, AI's capabilities can exponentially explode. Now, they believe "world models" are standing at a similar critical point.

The Key to Understanding and Constructing the World

"We are actually walking backwards through the path of evolution," Martin proposed this viewpoint, bringing the conversation to a philosophical level.

Language is one of the latest modules to appear in human brain evolution, while spatial perception systems have existed since arthropods, spanning five hundred million years. Today's AI, by merely "learning language", cannot truly be said to "understand the world". Only by constructing a human-like spatial model can AI be considered to have truly entered the realm of "embodied intelligence".

Fei-Fei Li concluded with her characteristic firm tone: "I have been waiting for this day. Not because I don't believe in language models, but because I deeply know: the real world is not composed of text."

The world model is the key to allowing AI to truly understand and construct this world. From I/O to iO, Jony Ive will drive a new design movement - AI is rewriting the computing paradigm and hardware definition, and is also the new battlefield after large models.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments