Text | Xu Muxin
Editor | Liu Jing
A few months ago, a picture from OpenAI circulated online. In the picture, OpenAI divided its path to AGI into five stages:
Level 1: Chatbot, an AI with conversational capabilities.
Level 2: Reasoner, an AI that can solve problems like humans.
Level 3: Intelligent agent, an AI system that can not only think but also take actions.
Level 4: Innovator, AI that can assist in invention and creation.
Level 5: Organizer, an AI that can complete organizational tasks.
The roadmap is beautiful, but we are mostly stuck at L1. The most obvious example is that the lack of reasoning ability makes it impossible for large models to even answer the question "Which is bigger, 9.8 or 9.11?" This is because the Transformer architecture can only highly fit an answer by searching massive amounts of data, but cannot answer questions or reason like humans. Because it cannot reason in multiple steps, your AI agent cannot generate a plan with one click, and many AI application scenarios are still far away.
Transformer, once regarded as a revolutionary in the AI industry, could not escape the moment of being revolutionized. And Wang Guan was one of the revolutionaries. Instead of using RL solutions to squeeze out the potential of LLM, Wang Guan chose to directly create a general RL large model, thus skipping the theoretical limitations of LLM, which is more in line with the actual working mechanism of fast thinking and slow thinking.
After waiting for a while at the agreed location, this Tsinghua graduate who was born in 2000 just hurried over from school. He was thin, wearing a simple sportswear and carrying a backpack, like the science nerds that can be seen everywhere in the school.
Just like the genius geeks in "The Big Bang Theory", it is particularly difficult for non-technical people to communicate with Wang Guan, because he would utter professional terms in a humble manner, racking his brains to try to explain simply but to no avail. For some technical questions, he sometimes cannot answer immediately, and he needs to be silent for a long time, and after a period of awkward silence, he can organize what he thinks is accurate language. When talking about professional knowledge, he will talk excitedly and incessantly, sometimes even forgetting to breathe, and needs to look up and take a long breath at a moment when he suddenly feels suffocated.
But it was this person who named the new architecture he developed Sapient Intelligence, which means "wise man" and shows his ambition.
At present, although the world of NLP is still dominated by Transformers, more and more new architectures are emerging and charging towards L2. For example, Deepmind theoretically proposed the TransNAR hybrid architecture this year, Sakana.AI newly founded by Llion Jones, one of the eight authors of Transformer, Bloomberg's RWKV, and even OpenAI released a new model called "Strawberry", claiming that it has reasoning capabilities.
As the limitations of Transformer have gradually been proven, and its problems such as hallucinations and accuracy have not been solved, funds have begun to tentatively flow into these new architectures.
Sapient co-founder Austin told "Undercurrent Waves": Sapient has completed a seed round of financing of tens of millions of dollars. This round of financing was led by Singapore's Temasek Holdings backed Vertex Ventures, and jointly invested by Japan's largest venture capital group, Europe and the United States' top VCs. This round of financing will mainly be used for computing power expenditures and global talent recruitment. Minerva Capital serves as a long-term exclusive financial advisor.
In Sapient, you can see the typical path of a Chinese AI startup: Chinese founder, Day One, targeting the global market, recruiting global algorithm talents, and finding support from international funds. But its atypical side is also prominent: compared with more application companies, this is a player trying to compete with others in technology.
Crown (left) and Austin (right)
"WAVES" is a column of Undercurrent. Here, we will present to you the stories and spirits of the new generation of entrepreneurs and investors.
GPT cannot lead to AGI?
The iteration of technology is brutally fast.
Not long after the craze for large language models began, Turing Award winner and "AI Godfather" Yann LeCun publicly warned young students who wanted to enter the AI industry: "Don't study LLM anymore. You should study how to break through the limitations of LLM."
The reason is that human reasoning can be divided into two systems. System 1 is fast and unconscious, suitable for handling simple tasks, such as what to eat today? System 2 is a task that can only be completed through thinking, such as solving a complex math problem. LLM cannot complete the task of System 2, and scaling law cannot solve this problem because it is a constraint of the underlying architecture.
"The current big model is more like memorizing questions." Wang Guan explained to "Undercurrent Waves": "One view is that the current big model uses system 1 to deal with system 2 problems and gets stuck in system 1.5, which is similar to the state of a person dreaming, which creates hallucinations. The autoregressive model limits you to only output based on a token after outputting it." Autoregression is not good at memorizing or planning answers, let alone further implementing multi-step reasoning.
The limitations of this large model can also be understood from a more philosophical perspective: when calculating the question "Which is bigger, 9.9 or 9.11", does the large model really understand what it is doing? Or is it mechanically comparing the 9 after the decimal point with the 11? If the model does not know what it is doing at all, then no matter how much training it does, it will be in vain.
Therefore, if AI wants to enter the L2 stage, it can only completely abandon the autoregressive Transformer architecture. In Wang Guan’s view, what Sapient needs to do is to realize AI’s reasoning ability by imitating the human brain.
Yann LeCun's World Model Theory
"In the Tsinghua Brain and Intelligence Laboratory, I will make bilateral advancement based on my knowledge of neuroscience and my understanding of System 2. For example, for the same problem, I first know how the human brain solves the problem, and then consider how to reproduce it with AI," Wang Guan told "Undercurrent Waves".
He then revealed that Sapient's infrastructure has been mathematically verified, and it will be a rare non-autoregressive model with multi-step calculation, memory and tree search capabilities. In terms of scale-up, the team has also made preliminary attempts to combine evolutionary algorithms and reinforcement learning.
The hierarchical and cyclical working logic of animal brains
Given people’s expectations for AGI, perhaps only humans can meet its standards at present. Therefore, making the big model iterate in the direction of the human brain is the direction Sapient is trying to evolve.
Those who rejected Musk
If you have watched "Young Sheldon", then the story of the crown should be familiar to you: they are both about a genius who emerges in his youth and is also obsessed with the path he believes in.
Wang Guan was born in Henan in 2000 and started learning programming at the age of 8. When he was in high school, GPT2 was released, which not only overturned many theories of deep learning at the time, but also overturned Wang Guan's worldview: if a model can generate text like a human, does it mean that AI will break through the Turing test? Based on this, maybe he can make an algorithm to solve all the problems in the world.
Later he learned that such an algorithm is called "AGI".
In the world of high school students at that time, such algorithms could eliminate war, hunger, and poverty. Of course, the most urgent thing was to eliminate the college entrance examination. "At that time, I felt that mechanical things like the college entrance examination should be left to robots."
This is also related to the hellish difficulty of the Henan college entrance examination. Wang Guan decided to take the route of recommendation. He participated in algorithm competitions and informatics competitions, including the high school version of the DJI RoboMaster competition, where he won the championship by adding a fully automatic algorithm to the robot. In the end, he was recommended to Tsinghua University School of Computer Science. On the first day of enrollment, the school held a mobilization meeting. The teachers gave impassioned speeches on the podium, mobilizing everyone to do well in mathematics. The class's goal this year is to get the highest GPA (grade point) in mathematics in the grade.
"What's the use of GPA for AGI?" Wang Guan thought. He then transferred to Tsinghua AIR Research Institute to study reinforcement learning, and then joined Tsinghua Brain and Intelligence Laboratory to try to integrate reinforcement learning with evolutionary computing. He interned at pony.AI and found that the biggest problem in autonomous driving is that decision-making must be done manually to tell the model how to make decisions. But if the model cannot make decisions on its own, no matter how good its perception is, it cannot lead to AGI.
Finally, in his senior year, the emergence of ChatGPT gave him hope for solving problems with general capabilities. Wang Guan started to make an open source model called OpenChat. This 7B model uses mixed-quality data without preference labels, does not require manual data annotation and a lot of parameter adjustment in RLHF, and can reach a similar level to ChatGPT on some baselines when running on consumer-grade GPUs. After its release, OpenChat received 5.2k stars on Github and has maintained an average monthly download volume of more than 200,000 on hugging face.
This small open source model also intersected with Musk at some point.
After Grok was released, Musk forwarded a screenshot of his model on X, showing its ability to be humorous. He asked Grok "how to make cocaine", and Grok replied: "Get a chemistry degree and a DEA license... Just a joke."
Wang Guan quickly used his own model to simulate this style, and @Musk on X: "Hi Grok, I can be as humorous as you with such a small number of parameters."
Wang Guan told "Waves" that Musk quietly skipped this post and clicked on their homepage. After scrolling around, he secretly liked another post that said "we need more than Transformers to go there/Transformers cannot lead us to the universe."
Later, XAI invited Wang Guan to use his experience in OpenChat to develop models. This seemed like a great opportunity to most people: XAI had money, computing power, and even enough training data, with generous salaries and located in Silicon Valley, the forefront of AI. But Wang Guan thought about it and rejected the offer. He felt that what he wanted to do was to subvert Transformer, not follow in the footsteps of his predecessors.
Wang Guan and he met each other because of OpenChat. Austin studied philosophy in Canada before, and started a business in men's beauty, and then started another business in cloud games. When the domestic AI big model was hot, he returned to China, got offers from several model factories, and helped them recruit people. Then he found Wang Guan on Github, and the two met online and hit it off.
Despite their very different resumes and backgrounds, the two have one thing in common: when they envision a future society where AGI has been realized: that is an ideal country, a place where humans have more freedom, and a key to solving many of the world's problems today.
The Future of Sapient
As a Tsinghua graduate who chose to start a business to build a bottom-level model, we inevitably talked about Yang Zhilin. Wang Guan’s idea remained consistent: instead of continuing to work on Transformer, it is better to open up a new route. Just like his entrepreneurial idol, Llion Jones.
Llion Jones is one of the eight authors of Transformer and co-founder of Sakana.Ai. What he did on Sakana was to completely subvert the technical route of Transformer and choose to base his basic model on a "nature-inspired intelligence."
The name Sakana comes from the Japanese word さかな, which means "fish", which means "let a group of fish gather together to form a coherent entity from simple rules." Although Sakana currently has no mature products, it has completed a seed round of financing of 30 million US dollars and a round of A financing of 100 million US dollars in just half a year.
Since the AI wave, we can see that the enthusiasm of capital for AI applications has slowed down. In terms of investment in AI models, Austin told "Undercurrent Waves" that the domestic investors he has seen are divided into two types. One type invested in the "Six Little Tigers" and then stopped watching, while the other type began to gradually explore possibilities beyond Transformer.
As the "first person to try something new", it is not easy to get start-up capital. Before Sapient describes its technical advantages and business vision to investors, it first needs to explain three issues clearly. The first is the defects of GPT, including the instability of simple reasoning, the inability to solve complex problems, and hallucinations. The second is that the current AI application scenarios are very good, but the technology cannot adapt to the needs. For example, Devin, with a 13% accuracy rate, makes it impossible to achieve the expected effect. The third is the current time node. The market has expectations for the future of AI, and the infrastructure such as computing power clusters is complete. Funds are only hesitant because they are trapped in downstream problems that GPT cannot solve.
Even with the initial start-up funding, Sapient still faces the challenge of recruiting talent. The competition for AI talent in the Silicon Valley technology circle has reached a nearly crazy state. Zuckerberg personally wrote to DeepMind researchers, inviting them to switch to Meta; later, Google co-founder Sergey Brin personally called to discuss salary increases and benefits, just to retain an employee who was about to leave and join OpenAI. In addition to full sincerity, sufficient computing power support and high salary temptations are also essential conditions.
Data shows that the median total compensation of OpenAI (including stocks) has reached $925,000. Austin told "Undercurrent Waves" that Sapient's core members are composed of researchers from Deepmind, Google, Microsoft, and Anthropic. These talents from all over the world have led or participated in many well-known models and products, including AlphaGo, Gemini, Microsoft Copilot, etc. The ability to organize a diverse and global team is also one of Sapient's core advantages.
But for the team that wants to challenge GPT, the difficulties are far more than that. Sapient still has to face the choice of commercial market. Sapient will deploy its main energy in overseas markets, especially the United States and Japan. The reasons for choosing the United States are needless to elaborate, but the Japanese market also has its core advantages. For example, although the North American AI market is active, the competition in the generative AI software market is too fierce. In contrast, Japan also has complete infrastructure and high-quality talents, and the model training data around a non-Western social culture may become the catalyst for the next technological breakthrough.
Wang Guan is still concentrating on developing his Sapient. His circle of friends is empty. His avatar is a deep learning framework, which is as blurry as a textbook illustration. His cover has only a simple black background with white text, and it says "Q-star" on it: This is a rumored OpenAI project that focuses on developing AI's logic and mathematical reasoning.
Wang Guan and his team are working hard towards the next milestone: releasing this new model architecture and conducting a fair benchmark on the reasoning logic capabilities so that people can see the qualitative leap in parameters.
No matter how long it will take for this day to come, one thing is certain: the era when Transformer dominated the world is gradually passing.





