Alireza Ghods's Insight

01-28

Here's a great breakdown on the difference between Vision Language Models (VLM) and Video-Action Models (VAM) VLMs and VAMs are incredibly powerful. They shine at perception, retrieval, search, and semantic understanding. If you want to find, classify, or reason about what’s in the world, VLMs are unmatched. But Physical AI breaks on something else: motion, causality, and dynamics. That’s where video data and World Models come in. They learn how the world evolves over time, not just what objects are called. The future is not VLM or World Models. It’s both. VLMs to understand and retrieve reality. World Models to simulate it, stress it, and train agents inside it. Different tools. Different layers. Same goal: machines that actually work in the real world. twitter.com/AlirezaGhods2/stat...

From Twitter

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

ODAILY

The Black Swan Event Was Actually This: The Real Reason for the Recent Bitcoin Crash

BTC

3.75%

MarsBit

The Black Swan Event Was Actually This: The Real Reason for the Recent Bitcoin Crash

BTC

3.75%

ME News

Breaking News! The Year of China's RWA: A Compliant Channel Opens for Trillions of Yuan in Domestic Assets to Go Global