[Introduction] Faced with the upcoming AI chip decisive battle, many chip startups are eager to try, hoping to break Nvidia's monopoly in one fell swoop. And AMD's MI300 actually performs better than H100 when deploying a GPT-4 model with a 32K context window?
The AI chip decisive battle is coming!
What does the future hold for AI hardware developer startups?
Tenstorrent CEO David Bennett bluntly stated that in our industry, the end result of most startups is bankruptcy.
How to survive? He suggested that startups should remain flexible and avoid limiting themselves to narrow use cases.
Here, Bennett is not referring to a niche market, but to a broad group consisting of dozens of companies from SiMa.ai to Cerebras. Together they have raised billions of dollars in venture capital to take on the market leader, Nvidia.
Bennett knows this well. He has been working in sales at AMD for more than ten years.
The CEO of his current employer, Tenstorrent, is Jim Keller, a hardware legend. He developed Apple's A4 and A5 processors that power the iPhone 4 and iPad 2, and was responsible for hardware work on Tesla's self-driving cars from 2016 to 2018.
Tenstorrent, which has raised more than $300 million from investors including Fidelity Ventures and Hyundai Motor, is following Bennett's advice: offering everything from chips to cloud computing and beyond.
Bennett said that today's chip startups are generally hovering between "making dedicated hardware for AI" and "relying on current popular models."
Nvidia's disadvantage: It's hard to make chips from scratch
The story of Nvidia’s AI chip GPU actually provides an advantage for today’s chip startups.
Graphics processing units originally powered computer graphics, but they have gained traction in AI applications because of their ability to perform multiple calculations in parallel.
However, this accident also brought disadvantages to NVIDIA - now, it is difficult for NVIDIA to manufacture chips from scratch without affecting the existing GPU business, and this provides opportunities for emerging startups. , making new hardware specifically built for AI.
For example, Tenstorrent engineers designed the Grayskull chip for future sparse neural networks, in which redundant information can be removed.
Still, Bennett believes that startups focused on building chips for large language models are too tightly tied to the Transformer architecture.
Under this architecture, Transformer-based models are basically predicting the most likely next word, so they have been criticized for generating answers based on probability rather than reasoning.
This means that these model architectures may not survive the current AI boom.
After all, due to the rapid development, the life span of today's LLM is relatively short. A model that was hot yesterday might be gone in a week or two.
Another risky area for hardware companies is making chips specifically for inference.
The representative in this regard is chip developer d-Matrix, which plans to release a dedicated inference chip in the first half of next year.
At first glance, this strategy seems good. Users of generative AI applications will now increasingly leverage existing proprietary or open source models rather than building their own models from scratch.
Because of this, many people believe that more money should be spent on model inference rather than model training.
While this may be a smart move from a business perspective, Bennett believes that focusing too narrowly on inference will prevent hardware developers from serving other use cases that may be more popular.
For example, for the low-precision calculations required to run models, a pure inference chip will suffice.
However, if developers want to fine-tune large models, they will likely need a chip that can handle higher-precision calculations.
Cutting-edge chips that put GPU and CPU together
In order to survive the coming AI chip Armageddon, chip developers need to change the architecture of their chips.
Today, most chips separate the GPU and CPU. The former can perform multiple calculations simultaneously, while the latter is responsible for executing more general instructions and managing a wider range of system operations.
However, more and more cutting-edge chips (such as Nvidia's Grace Hopper super chip and AMD's upcoming MI300A) put the GPU and CPU together.
This layout allows the CPU to prepare data faster and load it onto the GPU, thus speeding up model training.
In addition, hardware startups that want to break Nvidia's market dominance also face one of the biggest obstacles, which is software advantages.
Nvidia's Cuda software, which is used to write machine learning applications, can only run on its own chips. And this actually locks developers into Nvidia GPUs.
AMD MI300 runs GPT-4 update 6
Is Nvidia’s dominance so difficult to shake?
Semianalysis reporters Dylan Patel and Myron Xie recently published an article saying that AMD's MI300 will be significantly better than Nvidia's H100 in terms of cost performance!
They said that with the launch of the new generation MI300, AMD will soon become the only competitor of Nvidia and Google in the field of LLM inference.
In contrast, Groq, SambaNova, Intel, Amazon, Microsoft and other companies still cannot compete with it.
In addition, in order to cope with Nvidia's moat based on CUDA, AMD has been investing heavily in its own RoCM software, PyTorch ecosystem and OpenAI's Triton.
As companies such as Databricks, AI21, Lamini, and Moreph begin to use AMD GPUs for inference/training, AMD's own ecosystem is becoming more and more complete.
According to industry insiders, MI300 with larger video memory performs better when deploying GPT-4 models with 32K context windows.
Specifically, the performance advantage of MI300 compared to H100 is between 20% and 25%, depending on the context length and hint length/number of tokens output per query.
Coupled with the lower price, MI300 will be significantly better than Nvidia's H100 or even H200 in terms of cost performance.
Major manufacturers are placing orders one after another
Currently, Microsoft, Meta, Oracle, Google, Supermicro/Quantadirect, Amazon and other companies have placed orders for approximately 205,000 MI300 units from AMD.
Of these, 120,000 are exclusively for Microsoft, 25,000 for Meta, 12,000 for Oracle, 8,000 for Google, 5,000 for Amazon, and 35,000 for other companies.
And due to the huge quantity, Microsoft's purchase price for MI300 is expected to be much lower than that of other customers.
In order to calculate the revenue that MI300 will bring to AMD next year, it needs to be analyzed from two perspectives: how much supply AMD can ensure, and how much major customers will order.
On the supply side, the production capacity of MI300 will gradually increase during the year, but since Nvidia B100 will begin shipping in the second quarter and will increase significantly in the third quarter with the launch of the more cost-effective air-cooled version, this will not happen in the near future. This will largely affect AMD’s shipments in the fourth quarter.
At the same time, it is also necessary to consider the HBM output, CoWoS output, packaging output of memory manufacturers, and the situation of each accelerator produced using CoWoS, including NVIDIA, AMD, Google/Broadcom, Meta/Broadcom, Intel/Al Chip, Amazon/Al Chip , Amazon/Marvell, Microsoft/GUC, etc.
Even so, the industry still believes that MI300X shipments in the fourth quarter can reach 110,000 units.
On the customer side, Microsoft, Meta, Oracle, Google, Supermicro/Quantadirect and Amazon are the main sources of orders, but there are also some orders from other parts of the supply chain, including some MI300As for HPC-type applications.
In terms of profits, Nvidia has no signs of price reduction, but has only increased HBM capacity/bandwidth while keeping the price unchanged. Compared with Nvidia's profit margin of over 80%, AMD's profit margin on MI300 barely exceeds 50%.
AMD CEO Su Zifeng said that based on the company's rapid progress in AI and the purchase commitments of cloud computing customers, data center GPU revenue is expected to reach $400 million in the fourth quarter and exceed $2 billion in 2024.
This growth will also make MI300 the fastest product in AMD's history to reach $1 billion in sales.
In this regard, the industry is more optimistic about the sales of MI300X - it is expected to reach 3.5 billion US dollars.
Judging from AMD's current market share of less than 0.1% in the field of LLM training and inference, AMD's market share in the data center field will still grow steadily.
References:
https://www.theinformation.com/articles/an-ai-chip-armageddon-is-coming-biden-punts-on-open-source-llms?rc=epv9gi
https://www.semianalysis.com/p/amd-mi300-ramp-gpt-4-performance
This article comes from the WeChat public account "Xin Zhiyuan" (ID: AI_era) , editor: Hao Kong Aeneas, and 36 Krypton is published with authorization.