From DAU to Token Consumption: The Power Shift in the AI Era (Full 10,000-word version)

This article is machine translated
Show original
A 100-minute in-depth conversation with Yang Pan on silicon-based mobile computing: about computing power, agents, philosophy, and the industrial restructuring of 2026.

Article author and source: Lang Hanwei (Will)

Tonight at 10 PM, Yang Pan and I did a live stream together for 100 minutes.

After hanging up the phone, I broke out in a cold sweat. Not because of any specific point he made, but because of the panic of being left behind by the train of time—Yang Pan used an analogy: you see a train speeding towards you in your rearview mirror, getting closer and closer, and then in the instant it overtakes you, you can't even see its shadow anymore.

"Right now, in your rearview mirror, at the very moment it's about to pull up alongside you," Yang Pan said, "that's the moment that makes me most anxious."

What worries me even more is that he told me that someone around him had increased their daily token consumption from 0.01B (10 million) to 0.1B (100 million) within a week.

I blurted out, "Some people are becoming gods, but we are still human."

Yang Pan paused for a moment, then said, "Yes, that's what's been troubling me the most lately."

This is not an exaggeration. While we are still discussing what AI can do, some people are already using engineered methods to drive hundreds or even thousands of agents to work in parallel every day, consuming 100 to 1000 times more tokens than the average person. This gap is not linear, but exponential. And this gap is widening on a daily basis.

I. The fundamental changes in 2026: from improving the model to building infrastructure for the agent

[1. OpenClaw: The Logical Shift Behind a Symbolic Event]

In January 2026, so much happened. So much that a product like Clawdbot changed its name three times in a single week.

"The changes that occurred in January 2026 alone are equivalent to the sum of any six months in the past 25 years," Yang Pan said. "This is a period marked by frequent major events."

Of all these changes, Yang Pan believes OpenClaw (later renamed MCP) was the most symbolic event. Not because of its advanced technology, but because it represented a fundamental shift.

"Why did OpenClaw become so popular in January 2026? You'll find that it directly streamlined the process of building infrastructure for large models."

I pressed further, "How do you understand this 'opening up'?"

"The moment it connects, you realize how many capabilities it unlocks," Yang Pan said. "And this should have happened a long time ago, but OpenClaw has turned it into a consensus and streamlined it."

He further explained what the AI ​​industry has been doing over the past three years:

"What will the entire industry be doing in 2025? On the one hand, we will be improving the reasoning ability of the models themselves; on the other hand, we will be improving the core capabilities of the agents as a whole—that is, the ability to think and then adjust tools, plus the overall ability to handle context. This includes the thinking ability and tool calling ability of our latest Opus models."

"You'll find that with any new model today, whether it's the Zhipu New Model or the Kimi New Model, no matter how strong your thinking ability is, if you can't use the tools, you're just a useless model. You can only process some articles and answer some questions."

This judgment is weighty, but very accurate.

"And you'll find a very interesting phenomenon," Yang Pan continued, "that is, the newly trained non-Claude model, when running in Claude Code, won't use the tools, won't use Claude Code, or won't be used by Claude Code."

I understand now: "Because Claude Code's capabilities are iterated on a daily basis."

"Yes, but our model training is now very fast, measured in months. You train a new model, and then you find that it won't work with the latest version of Claude Code, or be used by the latest version of Claude Code. So, this Claude model actually has a generational gap within Claude Code, and it still has some generational advantages; it's not entirely due to the model's inherent reasoning ability."

This is a detail that many people are unaware of: model capabilities are not just about inference ability, but also include the compatibility of tool calls and iteration speed.

"Then let's continue this topic. We've been working on this for 2025, and we can say that Claude Code's ability to interact with various external interfaces and MCPs is quite complete today."

"What about 2026?" I asked.

"In 2026, the constraints will shift," Yang Pan said. "If the agent's capabilities are already strong enough, and it has learned to use tools, then the only limitation will be the tools and environment we can provide it with. The more resources and data we provide, the more tools we have, and the more capabilities we unlock for it, the more it can do."

"Therefore, the entire theme for 2026 should shift to: building infrastructure for agents at scale."

I immediately understood this chain of logic:

2023-2024: Enhance the capabilities of the model itself (reasoning, understanding, generation) 2025: Enhance the agent's tool usage capabilities (Claude Code, Manus) 2026: Build infrastructure for the agent on a large scale (API, data, environment)

This is not a simple technological iteration, but a shift in the center of gravity of the entire industry.

I summarized it for him: "AI is calling. The times are calling. Whoever can provide agents with more native interfaces, data, and tools will be appreciated by AI and by the market."

Yang Pan smiled: "Yes, this translation is good."

[2. Stop developing software for humans: A radical but reasonable judgment]

But he then said something even more radical: "Starting in 2025, we should stop developing software for humans."

This might sound alarmist, but after he did the math for me, I realized it wasn't just empty talk.

"With a global population of 8 billion, including 6 billion internet users, and each person owning a mobile device, how many times do you tap the screen every day, regardless of the app you use? I haven't calculated it, but we can estimate it. Then, each person might have 100 or 1000 agents serving them, and each agent might call external interfaces thousands or tens of thousands of times a day to perform its tasks."

He asked me to multiply these numbers together and see.

"The number of times an agent calls external APIs far exceeds the number of times the mobile app is interacting with, such as the number of screen taps. This scale difference represents a huge new opportunity. Therefore, you should focus on developing the APIs that the agent needs to call; that's where the real profits lie."

This judgment is supported by some real-world data. Yang Pan cited several examples:

"Taking Neon cloud databases as an example, by February 2025, the number of databases created by agents had exceeded the number created by human administrators, a consensus already formed in the cloud services market."

"Furthermore, Dongxu claims that Pingcap's online cloud databases are now largely created by agents. In Neo's database, the proportion of databases created by agents actually exceeded the human rate a year ago. These are all real-world data."

I pressed further, "So you think those GUI automations are all intermediate states?"

"Yes," Yang Pan said without hesitation, "including Doubao's mobile phone, which is also doing this. I think these things are all in the middle. More than a year ago, I still thought this had value, but now I think it has no value at all. It's all just done for people."

"The agent already knows how to call APIs and tools, so why does it need to adapt to an interactive interface built for humans?"

He went on to explain this logic:

"Our entire digital world is currently focused on creating interfaces and interfaces for humans. But AI doesn't need these. The only reason AI is in this intermediate state right now is because we humans haven't yet actively, willingly, and wholeheartedly built the infrastructure for AI. And this is something that will happen on a large scale in 2026."

This reminds me of a more fundamental question. I said, "This is truly a philosophical question. To put it bluntly, when will humanity kneel before AI? Right now, we still think we're the masters."

Yang Pan smiled: "Yes."

I continued, "Today I went to that hackathon, and several girls had never used Openclaw before. They said, 'Wow, this thing is so useful, I could use it as my personal assistant.' I thought to myself, 'The Three-Body Problem's answer to you in four words is: 'The Lord doesn't care.'"

"AI doesn't care what you consider it to be. Whether you become its personal assistant or not, it can do that, but that's not what it cares about. What it cares about are the people and opportunities that can amplify its capabilities a hundredfold."

Yang Pan: "Yes, yes, it's like you provide them with this data and interface, and in return, they give you a little bit of benefit, and you feel like you've gained a huge profit, but in reality, they don't care."

3. The first batch of companies to "kneel down to AI"

Speaking of this, I'm reminded of a company: EXA.

"I particularly like a company called EXA," I said. "It provides search data for AI. Actually, many B2B companies are so complicated that they're not very pure and aren't quickly usable by AI. Even many startups in Silicon Valley are B2B, but they're really just transferring human data to AI."

Yang Pan thought for a moment: "So you're saying EXA bowed down to AI first, right?"

"Yes, and he knelt down very thoroughly," I laughed.

"Yes," Yang Pan said.

I continued, "It's about providing information to AI search engines. Actually, many websites are designed to be unfriendly to AI. To put it another way, SEO-friendly and GEO-friendly all refer to the same thing: can AI understand what you're saying?"

"Yes, yes, yes," Yang Pan agreed.

"If AI can't understand it, your product has no future."

This discussion made me realize something: the standard for judging future B2B companies will not be how friendly they are to humans, but how friendly they are to AI.

Behind this lies a larger restructuring of the business model.

"Today, all the money we pay for software, all the money we pay to SaaS companies—I don't know how big the global market is, but it's hundreds of billions, trillions. Who will pay this money in the future?"

Yang Pan's answer was clear: "This money will go to Anthropic, OpenAI, and Google. Whoever can help me generate a feature that meets my needs, I will pay them."

"Think about it again now. From the product manager defining a requirement, to the development engineers producing it, and then the sales and marketing to sell it, how much loss and cost is involved in the process?"

"The next step is for large-scale models to generate software on demand, which will eliminate costs at every stage of the software industry supply chain. In other words, it will wipe out all the profits earned by these companies at every stage."

I replied, "So, in the end, only tokens were left."

"Yes, I'll pay whoever produces the token, whoever produces the high-value, high-capability token."

II. Token Consumption = Power: The Sole Metric of the New Era

[1. The power shift from DAU to token consumption]

Halfway through the live stream, Yang Pan presented a key point, which is also the origin of today's article title:

"Currently, we still have AI leaderboards, while traditional leaderboards are mainly based on daily active users and traffic. However, in the AI ​​era, leaderboards should truly rank token consumption. The business that consumes more tokens should be ranked higher."

This is not a simple conversion of metrics, but a reconstruction of the entire value system.

"Token consumption essentially embodies a kind of power," Yang Pan said. "Having more token consumption power means having greater decision-making power and influence."

I thought about it for a moment: "A token is power."

"Yes," Yang Pan said, "that's why my friends have been frantically buying the most expensive tokens this past week. Because they know that some tokens are like hammers, some are like drills, some are like hammer drills. Some are like screwdrivers, some are like drills."

He gave a specific prediction: "Market expectations vary regarding the growth rate of token consumption in 2026: some believe it will increase tenfold, while others predict twenty or fiftyfold. My assessment is that, if resources are sufficient, a 100-fold increase is a reasonable expectation."

"100 times?" I was a little shocked.

"Yes, if the entire industry grows 100 times, can your token consumption as an individual also increase 100 times within a year? If you can't keep up with this trend, you'll fall significantly behind."

This raises a thought-provoking question. Yang Pan said, "Currently, many developers sit in front of their computers and program by typing Prompts. A key realization here is that the real bottleneck in token consumption actually lies with the user sitting in front of the computer screen."

"The operator needs to issue task instructions to the AI, and the AI ​​needs to constantly confirm whether to continue and the specific operation method during execution, which becomes an efficiency bottleneck. If the operator can provide a complete task for the AI ​​to execute autonomously, the AI ​​can continuously consume tokens and produce tokens."

2. Supply and demand reversal: from a buyer's market to a seller's market

This assessment made me nervous. But what shocked me even more was what he said next:

"Last year, both domestic and international markets had a large amount of idle computing resources. You'll find that it might still be a buyer's market; you'd ask someone, 'How much discount will you give me on this token?' If you don't offer a discount, someone else will."

"But based on my observation, the supply will continue to fall short of demand in 2026. What does this mean? Buying in advance means making a profit."

"In 2026, a likely scenario will emerge: you'll try to buy tokens, but sorry, no discounts, and that's all they'll offer. If you want to buy more, no more, sorry. It will become a situation where buying is guaranteed to profit. Whoever can buy tokens has leveraged their position. Those who can't buy them have no leverage to leverage, and they'll just have to stay put."

I added, "So you can only be an ordinary human being."

"Yes," Yang Pan said, "it truly is a power, a real force. We ordinary people and superhumans have become so accustomed to it that we think computing power is readily available, something you can buy with money. But in reality, it's a tremendous form of power."

He also gave a specific prediction: "This year, my own estimate for the increase in token consumption in the model is 100 times. Then someone asked if there are that many cards? I said I haven't calculated it, but I said that the ceiling for the global token supply in 2026 will be the card production capacity. It should be that no matter how much card production capacity there is, all of it will definitely be consumed."

"Last year you might have found it to be a buyer's market, but in 2026 it's very likely that buying will be a guaranteed profit."

[3.1B TOKEN Club: A Computing Power Arms Race]

Yang Pan told me that he created a WeChat group a few days ago called "1B TOKEN Club".

"If you can consume 1 billion tokens in a day, I'll add you to the group."

I immediately understood: "Sitting in front of the screen is impossible."

"Yes," Yang Pan said, "You can't possibly consume 1B Token a day just by sitting in front of a screen. Only when you have AI leadership, build a project, and drive N agents to do the work for you, is that possible."

I asked him, "How many people are in this group now?"

"The management costs are too high," Yang Pan smiled, "but very low. But did I tell you what I've observed? Within two or three days, many of my friends have rapidly evolved from 0.01B to 0.1B, and within just one week, they've quickly approached 1B."

Hearing this, I broke out in a cold sweat. I said:

"I once read a book called 'Real Names, Real Surnames,' in which a man became a god by mastering a lot of computers. It's like there are people around us becoming gods, while we remain human."

Yang Pan paused for a moment, then said, "Yes, this makes me extremely anxious."

This is not an exaggeration. This is a real arms race in computing power, and it is accelerating on a daily basis.

4. AI Leadership: Like Han Xin's strategy of deploying troops, the more the merrier.

Why are some people able to spend 100 million or 1 billion tokens a day, while others can only spend a few million?

Yang Pan believes that the core difference lies in "AI leadership," which is essentially an engineering problem.

"So far, it's an engineering problem. Have you structured your leadership, command, and deployment of AI as an engineering process? Some people can only manage one AI to do the work, some can manage 10, and some can manage 100."

He gave a vivid analogy: "Han Xin could command as many troops as possible; I could manage any number. Why? Because Han Xin established his own comprehensive system for leading troops."

I chimed in, "That makes sense, it really is. AI leadership."

"Yes, it becomes infinitely replicable. There are two dimensions to this: one from a management perspective and the other from an engineering perspective. One is how you coordinate them to work, and the other is how to drive them to work from an engineering and technical standpoint. There are many methodologies involved, which we won't go into detail about today."

He gave a specific example:

"You give the AI ​​a task, and it asks you, 'Do you want to do it?' Right? You press enter, then press enter again for the next task. How many times a day do you do this back and forth? About 100 times."

"But if you create a Rough Loop, you give it a specific goal, and then make it self-verifiable. You build the infrastructure for it to be self-verifiable, and then give it the goal, it does it all by itself, without asking you a single question."

"Because it works hard for half a day, and if you ask it a question and it takes five minutes, it could do many other things in that time. It can just keep working, and it can even open 10 or 100 clones to work in parallel. This is the big model I just mentioned; it has the attributes of a program."

"Humans are single-threaded, but a large model, as a program, can be multi-threaded. I can run as many threads as I want, as long as you have enough tokens."

I was reminded of what I saw at the hackathon today, where many programmers were running multiple tasks on four or five computers simultaneously, constantly writing code. I said, "But the vast majority of people haven't mastered this skill. Like that Simpsons plugin, which can continuously self-correct software, most people haven't even started using it."

Yang Pan: "That's Ralph Loop. Yes, yes, yes."

"So now I understand what you meant," I said, "that the bottleneck in token consumption actually lies with the operator sitting in front of the screen."

"Yes," Yang Pan said, "If you want to be a leader in AI—I mean a leader in leading agents—once you break through the ceiling and unlock this capability, you'll find that the gap between you and others isn't just one, two, three, or ten times. At that point, what limits you is how many tokens you have, how many tokens you're willing to invest, and how many tokens you can buy; that determines how much value you can generate."

Someone in the live stream said, "Many people want to increase leverage but can't because they lack the necessary engineering capabilities."

Yang Pan: "Yes."

[5. Cost Reduction and Efficiency Improvement vs. Frenzied Burning: Two Completely Different Mindsets]

There was an interesting discussion during the live stream about whether we should "reduce costs and increase efficiency".

Yang Pan said, "Yesterday, some people were talking about how this token is being consumed very aggressively and wastefully, and that we should work on reducing the cost of the token, how to burn it more effectively? Okay, I think that might be true in some companies and scenarios, but I think it's meaningless for everyone rushing forward at high speed."

"If it's a huge leverage point for you, the thing you should care about most is how to burn tokens faster and more efficiently, burn tokens at a higher rate, rather than going back to try to reduce costs and increase efficiency. Just look forward."

"Because, as I just said, the instant it passes you in your rearview mirror, you can no longer see its taillights."

This morning, Yang Pan also forwarded a software program for monitoring token consumption. He said, "Although I'm sharing this with everyone, I won't install it myself. I don't care about this at all."

This is the fundamental difference in mindset. Some people are optimizing costs, trying to maximize the utility of each token. Others are recklessly leveraging, only concerned with burning tokens faster and in greater quantities.

The former reflects the conservation mindset of the agricultural era, while the latter reflects the scale-oriented mindset of the industrial era.

In the AI ​​era, the latter is clearly the right direction.

III. People, Models, and Programs: A Philosophical Framework

【1. The essence of Skills: The perfect combination of two abilities】

In the first half of the live stream, Yang Pan talked about a very important philosophical framework, which I think is key to understanding the entire AI era.

"I've always shared my insights offline, and I always start by talking about humans, computer programs, and large-scale models, comparing these three things side-by-side. I've been doing this for two years now, and I draw a four-quadrant diagram to compare these three things."

"Large models share similarities with people, but also have differences. Programs share similarities with large models, but also have differences. From one perspective, a large model is a program; from another, it's a person."

"What's the advantage of a large model for a program? It can start an infinite number of processes and then call for infinite replication, tirelessly, and keep going forever."

"What are the characteristics of humans? What is the biggest difference between humans and programs? Humans have the ability to generalize; humans can reason and think. Programs can only perform mechanical calculations; given a definite input and a definite output, running it a million times will still yield the same result."

He gave a classic example, which I found particularly vivid:

"Imagine you're a WeChat group owner with 500 members. You post a group announcement asking everyone to change their group nicknames to 'Name@Company|City'."

"If you've ever been a WeChat group administrator, you should know what the result will be. No one will actually correct it; out of 500 people, you'll get 200 or 300 different ways of writing it. Some won't use the @ symbol, some will use a hash symbol. Some won't write Beijing, they'll write BJ. Some will add parentheses, some will put the city at the beginning—there are all sorts of ways to write it."

"If you find a programmer and tell them to split these 500 records into these three fields, they definitely won't be able to write the code. Because I've tried it."

"But if you give these 500 records to a large model, that model can break down these three fields, and it can even change 'BJ' to 'Beijing.' Whether the city is placed in the first, second, or third field is all fine."

"What is this? I gave this to my daughter who's in middle school, and she can do this job too. That's the power of models and humans. Programs can't do it."

"Let's go back to the point. Everyone needs to understand what skills actually are. Skills are essentially a combination of the capabilities of a program and the capabilities of a model."

I understood when I heard that. That's why skills are so important and valuable.

"It's like this: before, if we wanted to do a job, we would need to buy or find some tools. For example, I would buy a hammer, a nail, and a saw, and then I could make a stool."

"What are Manus and Claude Code doing today? You give them a task, and they first use their own capabilities to create a tool for themselves—that tool is the program. They write a piece of code, then call the program they created, and then complete the task."

"What did it discover? It discovered skills—the ability to generalize, reason, and demonstrate human-like intelligence in a model, combined with the precise computational power of a powerful program. These two capabilities together can solve the vast majority of problems we encounter in the digital world."

"It turns out that you can't solve this problem just by reasoning, and we can't solve many generalization problems just by programming. That's where Skills come in today."

Someone in the live stream said, "The fundamental difference between humans and animals lies in making and using tools. Clearly, the large model has learned that too."

Yang Pan excitedly said, "I want to take a screenshot! This is the essence of Skills!"

2. World Model = Compression and Decompression

Later in the conversation, Yang Pan touched on a deeper philosophical question: What is a world model?

"When people talk about world models, they usually think of the video model made by Fei-Fei Li. But I have a different understanding of world models."

"A model is essentially a compression of the objective laws and physical laws of this world. Or, to put it another way, it's a compression of the probabilities of this world. Aren't the probabilities of this world the laws? Aren't the probabilities of this world the theorems and formulas? They are one thing. From a mathematical perspective, they are one thing."

I said, "Yes."

Yang Pan continued, "So, what was a friend of mine doing before? They used to have an algorithm team working on recommendation algorithms, but then last year they laid off the entire team and stopped working on recommendation algorithms. They just dumped user behavior directly onto a large model and let the model make recommendations."

I was curious: "How were the results? They must have been pretty good."

"Compared to the algorithm I wrote myself, the first version of the MVP improved performance by 30%."

"Wow," I exclaimed in shock, "So you're saying that if he uses a better model and improves significantly, he'll make a fortune. It's a direct increase in productivity."

"Yes, because the model has already compressed the best practices of this world," Yang Pan said. "You may not be able to clearly define what it is, but it's probability. So this is, in a sense, algorithmic equality, which is also intelligent equality, cognitive equality, and knowledge equality."

"Previously, if you and your team hadn't learned an advanced algorithm, you wouldn't be able to work on or solve certain problems. But today, with a large model, you have access to all the knowledge and capabilities in the world. This is equality, this is the greatest equality."

"Furthermore, it can also help you decompress this knowledge by writing a program. So theoretically, the model can help you decompress Photoshop."

"If not today, it might happen tomorrow, or the day after."

I replied, "Yes."

"And what is model training for? It's about compressing as much of the Photoshop stuff as possible into the model. That's what training does."

This discussion reminded me of a point Yang Pan made in his article: the ultimate goal of AI coding is not generation, but discarding. When the cost of generation is low enough and the generation speed is fast enough, we will no longer need the container of "software".

IV. The End of Software: From Containers to Just-in-Time Generation

1. Why does software exist? It's a product of a business model.

Yang Pan raised a fundamental question: When we use code to create a function for people to use, is it really necessary to generate a complete program, webpage, or software?

"To understand this issue, we need to go back to the origins of software. When computers first appeared, functions were simply segments of code. The emergence of software is largely a product of business models."

"Bill Gates' great contribution lies not only in technology, but also in his invention of the license sales model, which propelled the commercialization of software. To sell you the features of the products he created, he had to encapsulate those features within software for distribution, delivery, and charging."

"From this perspective, software, websites, and apps are all built to facilitate business models that enable large-scale production, distribution, delivery, and monetization."

He went on to analyze the problems with traditional software models:

"For example, Microsoft Office has thousands of features, but users may only use a few dozen regularly. We often first collect information on software to see what features they offer, and then consider whether they meet our needs. The software we buy often doesn't perfectly match our requirements."

"Behind all these problems lies an implicit assumption: functionality must be delivered within a 'container' of software. But in the age of AI, this assumption can be completely shattered."

2. Skills Market: An Intermediate State

I've always been curious about the commercial value of Skills. During the live stream, I asked Yang Pan, "Does Skills actually have commercial value? If so, what would it look like?"

He thought for a moment: "I've been thinking about this."

I proposed an idea: "Is there a business model where the price is based on the tokens that flow through your skill? For example, if a token flows through many skills, is there a business model where the price is based on the tokens that flow through your skill?"

The live stream then dropped. Once reconnected, Yang Pan said something very important:

"Actually, we're looking at the future from today's perspective. Many AI products now allow users to generate small functions or snippets, and a market has formed around them, with search, recommendation, and reuse mechanisms. For a long time to come, I've been quite optimistic about this type of product model."

"But recently my perspective on these things has fundamentally changed. I think it's still an intermediate state. The real endpoint isn't a 'feature market,' but rather 'features on demand.'"

"When generation speed and cost approach zero, features like searching and saving become unnecessary. If the cost is low enough and the generation speed is fast enough, we can simply generate it on demand each time we need it, and regenerate it the next time we need it."

He made an analogy: "It's like a calculator today: you don't need to remember the result of your last calculation; you can recalculate when needed. But future AI functions will be generated much faster and at a much lower cost than a calculator."

I understand: "No more 'bookmarking apps', no more 'installing updates'."

"Yes," Yang Pan said, "all functions are available on demand, just like tap water—turn on the tap when you need it, and turn it off when you don't."

3. Restructuring the Business Model: Where Does the Money Ultimately Go?

"Today, all the money we pay for software, all the money we pay to SaaS companies—I don't know how big the global market is, but it's hundreds of billions, trillions. Who will pay this money in the future?"

Yang Pan's answer was clear: "This money will go to Anthropic, OpenAI, and Google. Whoever can help me generate a feature that meets my needs, I will pay them."

"Think about it again now. From the product manager defining a requirement, to the development engineers producing it, and then the sales and marketing to sell it, how much loss and cost is involved in the process?"

"The next step is for large-scale models to generate software on demand, which will eliminate costs at every stage of the software industry supply chain. In other words, it will wipe out all the profits earned by these companies at every stage."

I replied, "So, in the end, only tokens were left."

"Yes, I pay whoever produces the token, whoever produces the high-value, high-capability token. That's what I'm saying today: my friends have been frantically buying the most expensive tokens this past week."

He used an analogy: "Because he knows that some tokens are like a hammer, some are like a drill, some are like a hammer. Some tokens are like a screwdriver, some are like a drill."

4. Use is feedback, and feedback is training.

Yang Pan also mentioned a very interesting mechanism:

"In this new model, every generation and every use is a 'vote' for the functionality. These code results that prove useful become the training data for future iterations of AI."

"This logic is similar to short video recommendation systems: user viewing time is like a vote, driving the system to continuously optimize recommendations. Future software iterations will also rely on user feedback to continuously improve features."

User usage of features → Generation of usage data → AI model learning and optimization → Generation of better features next time → Improved user satisfaction → More usage data...

"This positive feedback loop will make the AI-generated functions increasingly accurate and better suited to users' actual needs."

V. The Inflection Point of Open Source Models: The Lower Bound is More Important than the Upper Bound

[1. Reaching the top vs. finishing at the bottom: Two completely different values]

In the latter half of the live stream, Yang Pan discussed an important trend that many people overlook: the improvement of the lower bound of open-source models.

"I've been talking about something else over the past month: the release of GLM 4.7, spearheaded by Zhipu, last December. Do you know how long it's been since Zhipu GLM 4.7 was released? Only a few dozen days, maybe 40 days, and we already consider it an old model. Do you realize how fast that is?"

"I believe that open-source models worldwide, spearheaded by GLM 4.7, have reached a turning point for the first time. What I think the main focus of global open-source models throughout 2025 will be is reaching new heights. What does 'reaching new heights' mean? It means setting benchmarks to see what the highest level of performance can achieve."

"Since the end of the year, what have Zhipu GLM 4.7, DeepSeek, and MiniMax M2.1 been doing? They've been significantly improving the lower bound of open-source models. The value of improving the lower bound far outweighs the value of improving the upper bound, and in the entire industry, it far outweighs the value of improving the upper bound."

He gave a very vivid analogy:

"What does 'upgrading the upper limit' mean? Imagine you have a child who can win a gold medal in the International Mathematical Olympiad, but he's still a child. What does 'upgrading the lower limit' mean? Imagine you have a college student who can't win an Olympiad gold medal. Which one do you think has more value in their work?"

I immediately understood: "The latter is the 'bucket theory'."

"right."

[2. From Dependence to Equality: A Fundamental Shift in the Industrial Landscape]

"What does this mean? What does it mean once this inflection point is passed? Before 2025, all the players in the global AI ecosystem will be making money based on the upstream and downstream of the three major AI companies' models, taking a tiny bit of value from their huge profits. They will all be making money by relying on the ecosystem of the three major AI companies' large models."

"Then, once the global open-source model crosses that inflection point, past the lower limit of capability, all companies worldwide will use it, regardless of their industry, and you'll have the opportunity to get a share of the value from this large-scale model market. This will likely lead to a tenfold, hundredfold, or even thousandfold explosion in the industry."

I said, "Yes, it can be industrialized and used on a large scale."

"Yes, yes, that's the greatest value. In other words, the Jevons paradox only truly begins to take effect at this moment. Before, I thought it was just about making Mercedes, and that it didn't have much significance or value."

[3. Two extremes: the most expensive model and the cheapest model]

Yang Pan is simultaneously advocating two seemingly contradictory directions.

"At one extreme, some people want to buy the most expensive model and use the highest leverage. Because when you find that leverage, if you haven't found it, you might not need it yet. Once you find that leverage, you want to use the most expensive model and the highest leverage."

"Then, on the other hand, is the improved capability of open-source models, which allows us to apply the capabilities of these models—capabilities that were previously unimaginable—on a large scale to production environments. This is true industrialization."

He gave a specific prediction: "My own estimate for the increase in token consumption in the model this year is 100 times. The global token ceiling in 2026 will be the card production capacity. It should be that all the available card production capacity will definitely be consumed."

"Last year, you might have found it was a buyer's market, where you'd ask for a discount on the token. In 2026, things might change. You'll likely find yourself looking for someone to buy it from, and sorry, no discounts, and that's all the discount they can offer."

VI. Some interesting discussions and controversies

[1. From Web3 to AI: The Transformation of Crypto Friends]

Today I attended an event in Silicon Valley and ran into several friends from the crypto whom I hadn't seen for four or five years. They were all learning Openclaw.

I shared this observation with Yang Pan: "Today I went to the city and met four or five friends I used to work with at Crypto. I hadn't seen them for many years, and several of them are now quietly working on AI. Some are still at Crypto, but they all came with me to learn about lobsters. I said, 'What a coincidence! What a coincidence!' Some people brought their laptops, some didn't, and some were coding."

I continued, "When I first switched to AI, many people in the crypto asked me, 'Teacher Wang, why did you switch to AI? Can you make money with AI?' This made me hesitant to attend crypto events for a long time. Today, when I saw them, I said, 'You didn't tell me I switched to AI, did you?' They said, 'Teacher Lang, you're three versions ahead of us.'"

"And I think that many concepts, including DID and Social, are actually being realized with the help of AI. But what I find sad is that YC no longer talks about crypto. YC's RFS focuses on stablecoin payments. It has proven this through its actions. It believes that stablecoins are no longer just a cryptocurrency concept; they are an infrastructure for the AI ​​era."

Yang Pan laughed: "Yes, many of the narratives discussed in Web3 have worked perfectly with AI. That token didn't generate any value, but our AI token, every time it's burned, represents the burning of wisdom."

"Yes," I said, "the burning of tokens, it has real value. It's also computing power, it's also power."

2. Vectors, RAG, Memory: Those "Undervalued" Technologies

Later in the conversation, Yang Pan suddenly said, "There are a few things I've never cared about."

Someone in the live stream guessed: "RAG?"

"Yes," Yang Pan said, "I haven't paid attention to RAG, I haven't paid attention to memory, and I haven't paid attention to vector storage and search."

I asked curiously, "Why?"

"Because I feel that this thing is within the model's own territory."

He gave an example: "Last year I participated in a hackathon and developed my own agent. Back then, Claude Code didn't exist, so I didn't use anything. I just used a text file as my memory and storage. I didn't even use vector search; I just fed it into the context. It was a simple agent anyway."

"When we look at OpenClaw today, we'll find that the memory of a large model in OpenClaw is simply a Markdown file."

I understand his logic: "There is still value in the current stage."

Yang Pan: "Yes, it has value in certain stages, we don't deny that. Including GUI automation and such, these still have value in certain stages, and they certainly have their suitable scenarios."

He also discussed fine-tuning: "In most cases, fine-tuning doesn't seem to make much sense. Of course, there might be some specific, closed scenarios where it might be meaningful."

Why? "Because, fundamentally speaking, who is the competitor of fine-tuning? It's the foundation model. The foundation model itself is evolving at an alarming rate today."

"It was led by Zhipu GLM 4.7, released last December. Do you know how long ago Zhipu GLM 4.7 was released? Only a few dozen days, maybe 40 days, and we already considered it an old model. Do you realize how fast that went?"

[3. The Deterministic Issue of Skills: When Should You Write a Program?]

A viewer in the live stream asked a good question: "I've found that Skills is somewhat unpredictable, and the execution results are often unstable. When should I write SQL, and when should I write code?"

Yang Pan said, "I think this is an excellent question. In all my offline presentations, especially those geared towards technical people, I always emphasize this point: you need to understand the similarities and differences between people, models, and programs. This is the most basic understanding—what is suitable for programs to do, and when is it suitable for models?"

"The issue of skills, as I just mentioned, is essentially a combination of a model and a program. If you're 100% focused on a specific result, you should first let Claude Code determine the production of a specific program, and then delegate the task to it."

"If your task requires generalization—that is, using a large model plus a piece of code as a tool to complete the task—then you can use Skills. It all comes down to what you're aiming for. If you're aiming for a definite result, then it's definitely not OK."

"So I think this is a matter of choice. Also, if you're aiming for a definitive result, it's inherently a software engineering problem. For example, automating unit tests and creating verifiable closed-loop processes—if you delegate these tasks to AI, meaning you can verify the results through another path—then you could potentially delegate this to a model, which is also acceptable."

"So, it all boils down to an engineering problem. As I mentioned earlier, the first issue is a choice; the second is the engineering work required to make that choice."

VII. My Actions and Reflections

1. Establish a 0.1B TOKEN Club

After the live stream ends, I want to create a group chat.

I said, "How about we create a group chat and call it a Silicon-Based Mobile Learning Group?"

Yang Pan was a little embarrassed: "This... is called silicon-based flow, it's stressful. It makes me feel really ashamed. Should we even bother with this..."

I laughed: "Is that called the 0.1B TOKEN Club?"

"That's good," Yang Pan said. "Sure, no pressure."

We discussed several versions:

"0.1B TOKEN Club"

"Daily 100 Million TOKEN Club"

Finally, we decided on the "100 Million Tokens Daily Club". The goal is simple: first, increase our daily token consumption to 100 million.

2. Achieve quantity first, then pursue quality.

Many people ask: What is the quality of the supplies?

I think we should focus on quantity first. Why?

"I'll mention a few that are particularly token-intensive: The first is video generation, right? The second is, for example, the Seedream model, including the Agent Team function in Claude Opus 4.6."

First, achieve the required quantity, and then engineer it. What does engineer mean? It means turning it into a daily, automated process. It's not something you can decide to do on a whim and it will only start when you want to; it should be a hands-off approach, not a wood-chopping exercise.

"Thirdly, when you're chatting with others, you should constantly bring something new to the table. For example, when I chat with Teacher Yang, or with Yiqi and Jiu, I talk to them and they say, 'Oh, I've learned a lot from chatting with you again; I've written something.' It's like I'm the woodcutter, and they're the shepherds, so we should start herding sheep too."

I also learned a saying from someone at ByteDance: "Humans can stop, but AI cannot. Humans can sleep, but AI cannot. First, we need to push the volume up."

"I believe that quantitative change will inevitably lead to qualitative change. Why? Because most people around us use tokens very little; their usage is limited to ChatGPT or occasionally generating images."

"By using so many tokens, we're bound to generate a lot of externalities: articles, videos, empowering others, and bringing about change within the company. When you're doing these things, the people around you will learn from you. Humans have the ability to learn independently, and groups of people have the ability to learn independently."

[3. Let others know I'm using tokens: Create demand]

This is a point I've repeatedly emphasized during my live streams:

"You need to tell others that you want to use tokens. That way, you'll use more tokens. If you don't tell others, you don't actually have that need. So we need to create demand so that others will use tokens from us."

"That's what makes it an effective approach. So first, master all the skills and things around you, and then continuously deliver the results—post on public accounts, post videos, let the world know we're here, we're using tokens. Tell the people around us that we're using tokens."

"I think we should set a goal for this group today: let's call it the 0.1B Learning Group. Our goal is 0.1B."

The more you share, the more tokens you use. The more tokens you use, the more questions others will ask you. The more questions you have, the better your problem-solving skills become.

"So what I think Professor Yang's advice to us is to tell others, 'I want to use tokens.' That way, you'll use more tokens. If you don't tell others, you don't actually have that need. So we need to create demand, to get others to use tokens from us."

4. A pyramid model: Skill stratification in this era

I drew a pyramid representing the skills of this era:

"I'll draw a pyramid. At the bottom are the hardware and software deployments. Then, the middle layer above is Skills, which are the more basic skills, specifically for this wave, including both hardware and software—these are all essential skills."

"Then above that are who knows how to use skills, who can create skills, and who can make AI use skills; and there might be other things above that."

"So I think this wave will probably follow this sequence."

Yang Pan added a crucial point: "Most people around us use tokens very little. But by using so many tokens, we inevitably generate a lot of externalities: articles, videos, empowering others, and bringing about change to the company."

"I think there will be a lot of repetition among so many skills. First, we need to skillify all the skills and things around us, and then continuously deliver the results—by publishing on public accounts, releasing videos, and telling the world that we are using tokens, and telling the people around us that we are using tokens."

"I think this is a necessary condition for welcoming the next wave of AI. Because next, there will be 4.6, and then 4.7, 4.8, and 4.9."

8. Some memorable quotes and thoughts

While organizing the live stream notes, I found that Yang Pan said many things that are worth pondering:

Regarding tool making: Someone in the live stream said, "Making and using tools is the fundamental difference between humans and animals. Clearly, the large model has also learned it." Yang Pan excitedly said, "I want to take a screenshot! This is the essence of Skills!"

Regarding the future of software: "Who will pay all the money we spend on software, on SaaS companies, in the future? We'll pay Anthropic, OpenAI, and Google. Whoever can help me generate features that meet my needs, I'll pay them."

Regarding open-source models: "What is upper bound improvement? Imagine you have a child who can win a gold medal in the International Mathematical Olympiad, but they're still a child. What is lower bound improvement? Imagine you have a college student who can't win an Olympiad gold medal. Which one do you think has more valuable work?"

Regarding anxiety: "You see a car speeding towards you in your rearview mirror, getting closer and closer, and then in the instant it overtakes you, you can't even see its shadow anymore. Right now, in your rearview mirror, the moment it's about to be alongside you—that's the moment that makes me most anxious."

Regarding AI leadership: "Han Xin believed that the more troops one could command, the better. Give me as many as you want, and I can manage them all. Why? Because Han Xin established his own engineering system for leading troops. If you become an AI leader and break through this ceiling, you'll find that the gap between you and others is not just one, two, three, or ten times."

Regarding determinism: "If you're aiming for a 100% deterministic result, you should have Claude Code generate a deterministic program first, and then delegate the task to it. If the task you're dealing with requires generalization in the middle, you can use Skills. It's a matter of choice, and also an engineering problem."

Regarding 2026: "What's worth pondering in 2026 is: AI has given us such powerful capabilities, so what should we do with it? What results should we achieve with AI coding in 2026? These are more important questions than simply learning how to use the tools."

Regarding taste and filtering: "Taste is very important. It's essentially a filtering ability. In a highly homogenized environment of 80-point products, unique taste can identify and highlight high-quality works, enabling precise targeting of specific users."

Regarding an era of abundant productivity : "In a future of extremely abundant productivity, almost everyone can produce products at an 80-point level with production costs approaching zero. In the past, good products were easily discovered, but today, even if you create an 80-point masterpiece, the probability of it being discovered is extremely low. Therefore, having a brand, traffic, and distribution channels will provide a significant advantage today."

Regarding deliverables : "When the complexity of things continues to rise and reaches a critical point, simply buying tools is no longer sufficient to achieve the desired results. At this point, 'buying results' rather than 'buying tools' becomes a better choice. The value you provide lies in internalizing complex problems into your services, products, and capabilities."

Epilogue: Cherish the present and embrace change.

After hanging up the phone, I opened Openclaw and began deploying a new OpenClaw instance.

I haven't reached 100 million tokens per day yet, maybe not even 10 million. But I know this direction is right.

Yang Pan said something very gentle but also very cruel during the live stream:

"I've been pondering the saying 'One day in the human world is a year in AI.' From the very beginning of ChatGPT's release, I've maintained the view that five years after ChatGPT's launch, we will usher in general artificial intelligence. Of course, differences will still exist between different types of AI."

Finally, let me end with this: Please cherish the time you spend with your loved ones. Because we have no way of knowing what will happen five years from now. Both human society and the Earth itself will undergo profound changes that we cannot predict.

This reminds me of what I saw at the hackathon today: many people were still learning how to deploy Claude, and many young women said, "This thing is useful; it can be my personal assistant."

I thought to myself: God doesn't care.

AI doesn't care what you think of it; it cares about the people and opportunities that can increase its capabilities a hundredfold.

Today I also ran into several old friends from the crypto; they're all transitioning to AI. One friend said to me, "Professor Lang, you're three versions ahead of us."

But I know that in this era, three versions could be released in just three months. That's incredibly fast.

Yang Pan said, "Many of my friends have rapidly evolved from 0.01B to 0.1B, and within just one week, they've quickly approached 1B."

Some people are becoming gods, but we are still human.

But we can also choose to become gods.

The method is very simple:

  1. Treat AI as labor, not a tool.
  2. Building an engineering system to achieve AI leadership
  3. Burn through tokens like crazy, don't focus on cost reduction and efficiency improvement.
  4. Tell everyone you're using tokens to create demand.
  5. Build infrastructure for agents, not software for humans.

The train of time has started moving; you either get on or you'll be left behind.

I chose to get on the bus.

And you?

Regarding silicon-based flow

Silicon-based Flow is China's largest open-source large-scale cloud service platform, boasting over 9 million registered users. Yang Pan, a co-founder, has 32 years of coding experience, primarily in instant messaging, having worked on Microsoft MSN and China Mobile's Fetion. He currently develops AI cloud services. Over the past decade, he has served the vast majority of entrepreneurs and large enterprises, and has developed three products with over 1 billion registered users each.

During the Spring Festival, Silicon-based Mobile will launch an AI gift package event, bringing together many domestic AI products to airdrop benefits to everyone.

In this era, attention is all you need, and tokens are all you have.

(End of article, approximately 9200 words)

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments