China's "efficiency revolution" in computing power is more effective than expanding production lines for storage.
A counterintuitive fact: Chinese AI companies are achieving similar results with less memory, and the research papers are open source. This could potentially reduce the inference costs of the three major overseas AI companies—OpenAI, Anthropic, and Gemini—by an order of magnitude, increasing their gross profit margins while simultaneously reducing their memory requirements by an order of magnitude.
Taking DeepSeek's MLA architecture, KV caching optimization, and various model quantization technologies as examples, these actions directly and significantly reduce the GPU memory usage and bandwidth requirements during the inference stage, resulting in a precipitous drop in the cost of generating tokens per unit. Zhipu's ultra-high-speed inference and Alibaba and Xiaomi's Qianwen's caching billing have been reduced to one-tenth. What is the essence of these actions? They all focus on algorithmic efficiency compression, maximizing the utilization of computing power.
However, the market is using old maps to find new paths.
US AI stocks are still continuously investing in capital expenditures, locking in large amounts of production capacity and computing power in advance. $700 billion in capital expenditure is enough to make the entire AI upstream and downstream industry chain celebrate. This logic is correct; the demand for computing power and memory is indeed still very large and growing rapidly. The problem is that it overlooks another curve: China's potential for efficiency improvements in computing power optimization is equally astonishing.
Everyone is betting that the "water sellers" can continue to make money, but no one has noticed that those mining gold have suddenly learned to recycle water.
If Chinese AI companies further reduce memory usage efficiency by 50%, will the narrative of storage stocks, whose valuations are currently propped up by capital, still hold true?
The current exorbitant profits in the AI hardware industry chain are largely built on an absolute dependence on the highest-end HBM high-bandwidth memory. If the model's memory demand decreases significantly, it could directly break the monopoly premium of the existing leading manufacturers, and the underlying logic of storage and computing power stocks, whose valuations are propped up by capital expenditure, would loosen.
It seems that no one in the market is seriously calculating how much memory China's efficiency revolution at the algorithm layer can actually save.
However, objectively speaking, if inference costs and memory usage are reduced by 50%, it could lead to AI agents making high-frequency API calls around the clock and a massive explosion in AI applications. Even if the amount used per call is less, if the total call frequency increases tenfold, the overall absolute demand for memory and computing power will still surge.
China's computing power is more effective than expanding storage production lines, potentially breaking the monopoly premium of existing leading manufacturers. This is a risk that needs to be monitored, and it remains to be seen how far this path of computing power efficiency can go and whether it can continue to improve and optimize.
What is uncertain is how long this "unpriced" window of opportunity will last. Perhaps three months, perhaps a year.