Former OpenAI researcher Kevin Lu joins Thinking Machines Lab, founded by former OpenAI CTO Mira Murati. The company raised approximately $2 billion in early funding in July 2025, with a valuation of around $12 billion. Kevin Lu previously led GPT-4o mini and has long researched reinforcement learning, small models, and synthetic data.
Just now, another Chinese leader has left OpenAI.
Former OpenAI researcher Kevin Lu announced joining the AI startup Thinking Machines Lab.
Kevin Lu led the release of GPT-4o mini and participated in o*-mini, o3 and other model work.
Thinking Machines Lab was founded by former OpenAI CTO Mira Murati.
In July 2025, the company completed a historic early funding round of approximately $2 billion (led by a16z), with a valuation of around $12 billion.
Subsequently, core team members interacted on social platforms to welcome him.
Kevin Lu is a researcher in reinforcement learning and small models, who studied at UC Berkeley. During his time at OpenAI, he focused on reinforcement learning, Small Models, and synthetic data.
Before joining Thinking Machines, he worked on sequence decision-making and deep learning research at Hudson River Trading and MetaAI.
The Internet Truly Drives AI Progress
Kevin Lu's practical experience in small models and synthetic data will help Thinking Machines shorten the distance from papers to user value.
Especially his blog post in July became quite popular: The Internet Truly Drives AI Progress.
He clearly explained one thing: Instead of repeatedly focusing on architecture, it's better to expand, enrich, and get closer to realistic data sources (such as the internet) and data consumption methods, otherwise models will always "see little and understand little".
Blog address: https://kevinlu.ai/the-only-important-technology-is-the-internet
Below is a partial translation of the blog:
Although AI progress is often attributed to milestone papers like transformers, RNNs, and diffusion, this overlooks AI's fundamental bottleneck: data.
So, what exactly does "good data" mean?
If we truly want to advance AI, instead of researching deep learning optimization, we should research the "internet".
The internet is the key technology that enables our AI models to scale.
[The rest of the translation continues in the same manner, maintaining the original structure and key technical terms.]Poor models! They know too little, with much still hidden.
After GPT-2, the world began to focus on OpenAI, and time has proven its influence.
What if there is Transformer but no Internet?
Low data. In the low-data paradigm, Transformer might be worthless: its "architectural prior" is not as good as CNN or RNN, so performance should be worse.
Books. In a less extreme scenario: without the Internet, we might use books/textbooks for pre-training. Textbooks are often viewed as the pinnacle of human wisdom: authors are well-educated, carefully choosing words. This represents a belief: "high-quality data trumps large quantities of data."
Textbooks and Phi. The Phi series ("Textbooks Are All You Need") performs excellently on small models, but still relies on GPT-4 trained on internet data for filtering and synthesis.
Overall, Phi is quite good, but has not yet proven it can achieve the asymptotic performance of models pre-trained on internet data; and textbooks lack extensive real-world and multilingual knowledge (though they are strong under computational constraints).
[The translation continues in the same manner for the entire text, maintaining the original formatting and translating all non-tagged content to English.]Now there are some ideas, but each has flaws. They are not considered "pure research" and all involve building products around RL.
The attributes we expect are: diversity, natural curriculum, PMF, economic feasibility.
Final comment: It's also possible to sacrifice some diversity first - using RL to optimize indicators in our own products (games, vending machines, retention/profit/engagement, etc.).
This might work, but the difficulty lies in: how to "elevate" it into a diverse, scalable reward universe, thereby triggering a paradigm-level leap.
In short, we are still far from finding an "RL counterpart" that is as elegant and prolific as "the internet was to NTP".
Finally, Kevin Lu emphasizes again that during training, the model only "sees" things in the dataset; the world outside is equivalent to being ignored (0 weight).
We hope that one day we will find a way to solve this problem.
References:
https://x.com/_kevinlu/status/1942977315031687460
This article is from the WeChat official account "New Intelligence", author: New Intelligence, published with authorization by 36Kr.