Chainfeeds Summary:
This revolution may bring an end to the grand party meticulously organized by the AI shovel sellers sooner than anyone imagined.
Article source:
https://www.techflowpost.com/zh-CN/article/30329
Article Author:
Bing Ventures
Opinion:
Bing Ventures: This revolution is not a single event, but rather an intertwining of two seemingly independent technological paths. The first path is a revolution in algorithm slimming. In the past, we assumed that for a super-large model to become smarter, it must mobilize all parameters to operate at full capacity, burning more and more computing power. But DeepSeek broke this consensus with its MoE (Hybrid Expert Model) architecture. You can think of MoE as a company with hundreds of experts, which only invites the two or three most relevant people to a meeting each time a problem is solved, instead of having everyone brainstorm together. The overall model size is still huge, but each inference only activates a very small portion of the parameters, thereby significantly reducing computing power consumption. DeepSeek-V2 nominally has 236 billion parameters, but each actual calculation only calls about 21 billion, less than 9% of the total, while its performance is comparable to GPT-4, which needs to run at full load. This means that for the first time, AI capabilities and computing power consumption have been systematically decoupled. The ironclad rule of "stronger = more GPUs" has been completely broken, and the evolution at the algorithm level is directly weakening the essential attributes of NVIDIA GPUs. The second path is a hardware revolution. AI work is divided into two stages: training and inference. Training emphasizes massively parallel computing, where GPUs have a natural advantage; while inference emphasizes response speed and energy efficiency, where GPUs suffer from structural bottlenecks. The core problem is that GPUs' high-bandwidth memory (HBM) is an external design, and the back-and-forth data transfer introduces physical latency. It's like a chef having to run to the next room to get ingredients for each dish – no matter how fast, instantaneous response is impossible. Emerging companies like Cerebras and Groq have chosen to completely restructure chip architecture, soldering high-speed SRAM directly inside the chip to achieve near-zero latency data access, specifically designed for inference scenarios. The market has already begun to vote with real money: OpenAI, while complaining about the high cost and low efficiency of GPU inference, signed a long-term computing power contract worth tens of billions of dollars with Cerebras; Nvidia also quickly moved in, spending approximately $20 billion to acquire Groq, attempting to fill the gap in the inference track. This means that the focus of AI computing is shifting from general-purpose GPUs to dedicated inference chips. When algorithm slimming and hardware repurposing converge, there's only one outcome: a cost collapse. The slimmed-down MoE model is significantly smaller, allowing it to be fully integrated into the on-chip memory of an inference chip; dedicated chips eliminate external memory bottlenecks, resulting in an order-of-magnitude increase in inference speed. Ultimately, training costs decrease by about 90% due to sparse computing, and inference costs decrease by another order of magnitude. Combined, the total cost of building and running world-class AI may only be 10%–15% of traditional GPU solutions. This isn't incremental improvement, but a paradigm shift. Nvidia's trillion-dollar market capitalization is built on the single narrative that AI must rely on GPUs. However, as training requirements are compressed by algorithms and the inference market is diverted to dedicated chips, its monopoly begins to crumble. The biggest black swan event of the future may not be the explosive popularity of a particular AI application, but rather a seemingly insignificant new MoE paper or an inference chip market share report, quietly announcing a new phase in the computing power war. When the shovel seller's shovel is no longer unique, his golden age may also be coming to an end.
Content source





