An article to understand NVIDIA GTC conference Huang Renxun's speech: Believe that computing power never sleeps

This article is machine translated
Show original

Author: Su Yang, Hao Boyang; Source: Tencent Technology

As the "shovel seller" in the AI ​​era, Huang Renxun and his NVIDIA always believe that computing power never sleeps.

picture

Huang Renxun said in his GTC speech that inference has increased computing power requirements by 100 times

At today's GTC conference, Huang Renxun brought out the new Blackwell Ultra GPU, as well as the server SKUs derived from it for inference and agent, including the RTX family bucket based on the Blackwell architecture. All of this is related to computing power, but what is more important next is how to consume the endless computing power reasonably and effectively.

In Huang Renxun's eyes, the path to AGI requires computing power, embodied intelligent robots require computing power, and building Omniverse and world models requires an endless supply of computing power. As for how much computing power is ultimately needed for humans to build a virtual "parallel universe," NVIDIA has given an answer - 100 times as much as in the past.

To support his point of view, Huang Renxun showed a set of data at the GTC site - in 2024, the top four cloud manufacturers in the United States will purchase a total of 1.3 million Hopper architecture chips. By 2025, this figure will soar to 3.6 million Blackwell GPUs.

The following are some key points from NVIDIA’s GTC 2025 conference compiled by Tencent Technology:

Blackwell family bucket is launched

1) The annual "nuclear bomb" Blackwell Ultra is squeezing toothpaste

NVIDIA released the Blackwell architecture and launched the GB200 chip at GTC last year. This year, the official name has been slightly adjusted. Instead of the previously rumored GB300, it is directly called Blakwell Ultra.

But from the hardware point of view, it is just a new HBM memory replaced on the basis of last year. In a nutshell, Blackwell Ultra = Blackwell large memory version.

Blackwell Ultra is composed of two TSMC N4P (5nm) process, Blackwell architecture chips + Grace CPU package, and is equipped with a more advanced 12-layer stacked HBM3e memory. The video memory is increased to 288GB. Like the previous generation, it supports the fifth-generation NVLink and can achieve 1.8TB/s of inter-chip interconnection bandwidth.

picture

NVLink performance parameters over the generations

Based on the storage upgrade, the FP4 precision computing power of the Blackwell GPU can reach 15PetaFLOPS, and the reasoning speed based on the Attention Acceleration mechanism is 2.5 times faster than that of the Hopper architecture chip.

2) Blackwell Ultra NVL72: AI inference dedicated cabinet

picture

Blackwell Ultra NVL72 official picture

Like GB200 NVL72, NVIDIA also launched a similar product this year, the Blackwell Ultra NVL72 cabinet, which consists of a total of 18 computing trays, each of which contains 4 Blackwell Ultra GPUs + 2 Grace CPUs, a total of 72 Blackwell Ultra GPUs + 36 Grace CPUs, with a video memory of 20TB and a total bandwidth of 576TB/s, plus 9 NVLink switch trays (18 NVLink switch chips), and an NVLink bandwidth of 130TB/s between nodes.

The cabinet has 72 built-in CX-8 network cards, providing 14.4TB/s bandwidth, and Quantum-X800 InfiniBand and Spectrum-X 800G Ethernet cards can reduce latency and jitter and support large-scale AI clusters. In addition, the rack also integrates 18 BlueField-3 DPUs for enhanced multi-tenant networking, security, and data acceleration.

Nvidia said that this product is specially customized "for the era of AI reasoning". Application scenarios include reasoning AI, Agent and physical AI (data simulation and synthesis for robot and intelligent driving training). Compared with the previous generation product GB200 NVL72, the AI ​​performance has been improved by 1.5 times. Compared with the DGX cabinet product with the same positioning of the Hopper architecture, it can provide data centers with a 50-fold revenue increase opportunity.

According to official information, the reasoning of the 671 billion parameter DeepSeek-R1 can achieve 100 tokens per second based on the H100 product, while using the Blackwell Ultra NVL72 solution, it can reach 1000 tokens per second.

Converted into time, for the same reasoning task, H100 needs to run 1.5 minutes, while Blackwell Ultra NVL72 can complete it in 15 seconds.

picture

Blackwell Ultra NVL72 and GB200 NVL72 Hardware Parameters

According to information provided by NVIDIA, Blackwell NVL72-related products are expected to be available in the second half of 2025. Customers include server manufacturers, cloud factories, and computing power leasing service providers:

  • Server manufacturers

    15 manufacturers including Cisco/Dell/HPE/Lenovo/Supermicro

  • Cloud Factory

    AWS/Google Cloud/Azure/Oracle Cloud and other mainstream platforms

  • Computing power leasing service provider

    CoreWeave/Lambda/Yotta, etc.

3) Advance notice of the real "nuclear bomb" GPU Rubin chip

According to NVIDIA's roadmap, Blackwell Ultra will be the home of GTC2025.

However, Huang Renxun also took this opportunity to preview the next-generation GPU based on the Rubin architecture that will be launched in 2026 and the more powerful cabinet Vera Rubin NVL144 - 72 Vera CPUs + 144 Rubin GPUs, using HBM4 chips with 288GB of video memory, a video memory bandwidth of 13TB/s, and equipped with the sixth-generation NVLink and CX9 network cards.

How powerful is this product? The inference computing power of FP4 precision reaches 3.6 ExaFLOPS, and the training computing power of FP8 precision reaches 1.2 ExaFLOPS, which is 3.3 times the performance of Blackwell Ultra NVL72.

If you think it is not enough, it doesn’t matter. In 2027, there will be a more powerful Rubin Ultra NVL576 cabinet, with FP4 precision inference and FP8 precision training computing power of 15ExaFLOPS and 5ExaFLOPS respectively, which is 14 times that of Blackwell Ultra NVL72.

picture

Parameters of Rubin Ultra NVL144 and Rubin Ultra NVL576 provided by NVIDIA

4) Blackwell Ultra version of DGX Super POD "supercomputing factory"

For customers whose needs cannot be met by Blackwell Ultra NVL72 at this stage and who do not need to build ultra-large-scale AI clusters, NVIDIA's solution is the plug-and-play DGX Super POD AI supercomputing factory based on Blackwell Ultra.

As a plug-and-play AI supercomputing factory, DGX Super POD is mainly designed for AI scenarios such as generative AI, AI Agent and physical simulation, covering the full-process computing power expansion needs from pre-training, post-training to production environment. Equinix, as the first service provider, provides liquid cooling/air cooling infrastructure support.

picture

DGX SuperPod built with Blackwell Ultra

The DGX Super POD customized based on Blackwell Ultra is available in two versions:

  • DGX SuperPOD with built-in DGX GB300 (Grace CPU ×1 + Blackwell Ultra GPU ×2), a total of 288 Grace CPUs + 576 Blackwell Ultra GPUs, providing 300TB of fast memory and a computing power of 11.5 ExaFLOPS at FP4 precision

  • DGX SuperPOD with built-in DGX B300. This version does not contain the Grace CPU chip, has further expansion space, and uses an air-cooled system. The main application scenario is ordinary enterprise-level data centers.

5) DGX Spark and DGX Station

In January this year, Nvidia unveiled a conceptual AI PC product priced at $3,000 at CES - Project DIGITS, which now has an official name, DGX Spark.

In terms of product parameters, it is equipped with a GB10 chip, with a computing power of 1 PetaFlops at FP4 precision, built-in 128GB LPDDR5X memory, CX-7 network card, 4TB NVMe storage, running the DGX OS operating system customized based on Linux, supporting frameworks such as Pytorch, and pre-installed with some basic AI software development tools provided by NVIDIA, and can run 200 billion parameter models. The size of the whole machine is close to that of Mac mini, and two DGX Sparks are interconnected, and models with more than 400 billion parameters can be run.

Although we say it is an AI PC, it essentially still belongs to the category of supercomputing, so it is placed in the DGX product series rather than consumer-grade products such as RTX.

However, some people have complained about this product. The advertised performance of FP4 has low usability. When converted to FP16 precision, it can only compete with RTX 5070 or even the $250 Arc B580, so the cost performance is extremely low.

picture

DGX Spark computers and DGX Station workstations

In addition to the officially named DGX Spark, NVIDIA also launched an AI workstation based on Blackwell Ultra. This workstation has a built-in Grace CPU and a Blackwell Ultra GPU, coupled with 784GB of unified memory and CX-8 network card, providing 20PetaFlops of AI computing power (officially not marked, theoretically also FP4 accuracy).

6) RTX sweeps AI PCs and is also trying to squeeze into data centers

The product SKUs introduced earlier are all based on Grace CPU and Blackwell Ultra GPU, and they are all enterprise-level products. Considering that many people are interested in the wonderful use of products such as RTX 4090 in AI reasoning, NVIDIA has further strengthened the integration of Blackwell and RTX series at this GTC, and launched a large number of AI PC-related GPUs with built-in GDDR7 memory, covering scenarios such as notebooks, desktops and even data centers.

  • Desktop GPUs: RTX PRO 6000 Blackwell for Workstations, RTX PRO 6000 Blackwell for Workstations Max-Q, RTX PRO 5000 Blackwell, RTX PRO 4500 Blackwell and RTX PRO 4000 Blackwell

  • Laptop GPUs: RTX PRO 5000 Blackwell, RTX PRO 4000 Blackwell, RTX PRO 3000 Blackwell, RTX PRO 2000 Blackwell, RTX PRO 1000 Blackwell, and RTX PRO 500 Blackwell

  • Data Center GPU: NVIDIA RTX PRO 6000 Blackwell Server Edition

picture

NVIDIA's AI "family bucket" for enterprise computing

The above are only some of the SKUs customized for different scenarios based on the Blackwell Ultra chip, ranging from workstations to data center clusters. NVIDIA itself calls it the "Blackwell Family", and the Chinese translation "Blackwell Family Bucket" is very appropriate.

NVIDIA Photonics: CPO system standing on the shoulders of teammates

The concept of co-packaged optoelectronic module (CPO) is to package the switch chip and optical module together, which can convert optical signals into electrical signals and make full use of the transmission performance of optical signals.

Prior to this, the industry had been discussing Nvidia's CPO network switch product, but it had not been launched yet. Huang Renxun also gave an explanation on the spot - due to the extensive use of fiber optic connections in data centers, the power consumption of optical networks is equivalent to 10% of computing resources. The cost of optical connections directly affects the Scale-Out network of computing nodes and the improvement of AI performance density.

picture

Parameters of the two silicon photonic co-sealed chips Quantum-X and Spectrum-X displayed at GTC

At this year's GTC, NVIDIA launched the Quantum-X silicon photonics co-sealed chip, the Spectrum-X silicon photonics co-sealed chip, and three derived switch products: Quantum 3450-LD, Spectrum SN6810, and Spectrum SN6800.

  • Quantum 3450-LD: 144 800GB/s ports, 115TB/s backplane bandwidth, liquid cooling

  • Spectrum SN6810: 128 800GB/s ports, 102.4TB/s backplane bandwidth, liquid cooling

  • Spectrum SN6800: 512 800GB/s ports, 409.6TB/s backplane bandwidth, liquid cooling

The above products are uniformly classified as "NVIDIA Photonics". NVIDIA said that this is a platform co-developed and developed based on the CPO partner ecosystem. For example, its micro-ring modulator (MRM) is optimized based on TSMC's optical engine, supports high-power, high-efficiency laser modulation, and uses a detachable fiber optic connector.

What is more interesting is that according to previous industry information, TSMC's micro-ring modulator (MRM) was created by it and Broadcom based on 3nm process and advanced packaging technologies such as CoWoS.

According to data provided by NVIDIA, Photonics switches that integrate optical modules have a 3.5-fold performance improvement over traditional switches, a 1.3-fold increase in deployment efficiency, and more than 10 times the expansion elasticity.

Model efficiency vs. DeepSeek: Software ecosystem focuses on AI Agent

picture

Huang Renxun describes the "BTC pie" of AI infra on the spot

Because Huang Renxun only talked about software and embodied intelligence for about half an hour during the two-hour GTC, many details were supplemented by official documents rather than from the scene.

1) Nvidia Dynamo, Nvidia's new CUDA for inference

Nvidia Dynamo is definitely the software bomb released this time.

It is an open source software built for inference, training, and acceleration across the entire data center. Dynamo's performance data is quite shocking: on the existing Hopper architecture, Dynamo can double the performance of the standard Llama model. And for specialized inference models such as DeepSeek , NVIDIA Dynamo's intelligent inference optimization can also increase the number of tokens generated by each GPU by more than 30 times.

picture

Huang Renxun demonstrated that Blackwell with Dynamo can outperform Hopper by 25 times

These improvements in Dynamo are mainly due to distribution, which distributes different computational stages of LLM (understanding user queries and generating optimal responses) to different GPUs, allowing each stage to be optimized independently, improving throughput and speeding up response times.

picture

Dynamo system architecture

For example, in the input processing stage, which is the pre-filling stage, Dynamo can efficiently allocate GPU resources to process user input. The system will use multiple groups of GPUs to process user queries in parallel, hoping that the GPU processing will be more decentralized and faster. Dynamo uses FP4 mode to call multiple GPUs to "read" and "understand" user questions in parallel. One group of GPUs processes the background knowledge of "World War II", another group processes historical data related to "causes", and the third group processes the timeline and events of "process". This stage is like multiple research assistants looking up a large amount of information at the same time.

When generating output tokens, that is, the decoding stage, the GPU needs to be more focused and coherent. Compared with the number of GPUs, this stage requires a larger bandwidth to absorb the thinking information of the previous stage, so more cache reads are required. Dynamo optimizes inter-GPU communication and resource allocation to ensure coherent and efficient response generation. On the one hand, it fully utilizes the high-bandwidth NVLink communication capabilities of the NVL72 architecture to maximize token generation efficiency. On the other hand, through the "Smart Router", the request is directed to the GPU that has cached the relevant KV (key value), which can avoid repeated calculations and greatly improve the processing speed. Because repeated calculations are avoided, some GPU resources are released and Dynamo can dynamically allocate these idle resources to new incoming requests.

This architecture is very similar to Kimi's Mooncake architecture, but NVIDIA has provided more support in the underlying infra. Mooncake can improve performance by about 5 times, but Dynamo has a more obvious improvement in reasoning.

For example, among the important innovations of Dynamo, the "GPU Planner" can dynamically adjust GPU allocation according to the load, the "Low Latency Communication Library" optimizes data transmission between GPUs, and the "Memory Manager" intelligently moves inference data between storage devices of different cost levels to further reduce operating costs. The intelligent router, the LLM-aware routing system, directs requests to the most appropriate GPU to reduce repeated calculations. This series of capabilities optimizes the GPU load.

This software inference system can be efficiently expanded to large GPU clusters, allowing a single AI query to seamlessly scale to up to 1,000 GPUs to fully utilize data center resources.

For GPU operators, this improvement significantly reduces the cost per million tokens and greatly increases production capacity. At the same time, each user can obtain more tokens per second, with faster response and improved user experience.

picture

Using Dynamo, servers can reach the golden line between throughput and response speed

Unlike CUDA, which is the underlying foundation for GPU programming, Dynamo is a higher-level system that focuses on the intelligent distribution and management of large-scale inference loads. It is responsible for the distributed scheduling layer of inference optimization and is located between the application and the underlying computing infrastructure. But just as CUDA revolutionized the GPU computing landscape more than a decade ago, Dynamo may also successfully create a new paradigm for the efficiency of inference software and hardware.

Dynamo is completely open source and supports all mainstream frameworks from PyTorch to Tensor RT. Even if it is open source, it is still a moat. Like CUDA, it only works on NVIDIA GPUs and is part of the NVIDIA AI reasoning software stack.

With this software upgrade, NVIDIA has built its own defense against dedicated inference AISC chips such as Groq. Only software and hardware can dominate the inference infrastructure.

2) Llama Nemotron's new model is efficient, but still cannot beat DeepSeek

Although Dynamo is indeed quite amazing in terms of server utilization, NVIDIA is still a little behind the real experts in training models.

NVIDIA used a new model, Llama Nemotron, at this GTC, which focuses on efficiency and accuracy. It is derived from the Llama series of models. After special fine-tuning by NVIDIA, this model has been optimized through algorithm pruning and is lighter than the Llama itself, weighing only 48 bytes. It also has similar reasoning capabilities to o1. Like Claude 3.7 and Grok 3, the Llama Nemotron model has a built-in reasoning capability switch, and users can choose whether to turn it on. This series is divided into three levels: entry-level Nano, mid-range Super and flagship Ultra, each of which is targeted at the needs of enterprises of different sizes.

picture

Specific statistics of Llama Nemotron

Speaking of efficiency, the fine-tuning dataset of this model is composed entirely of synthetic data generated by NVIDIA itself, with a total of about 60B tokens. Compared with DeepSeek V3, which took 1.3 million H100 hours to fully train, this model, which has only 1/15 of the parameters of DeepSeek V3, took only 360,000 H100 hours to fine-tune. The training efficiency is one level lower than DeepSeek.

In terms of inference efficiency, the Llama Nemotron Super 49B model does perform much better than the previous generation model, with a token throughput of 5 times that of the Llama 3 70B. It can handle more than 3,000 tokens per second on a single data center GPU. However, in the data released on the last day of DeepSeek Open Source Day, the average throughput of each H800 node during pre-filling was about 73.7k tokens/s input (including cache hits) or about 14.8k tokens/s output during decoding. The gap between the two is still obvious.

picture

From the performance point of view, the 49B Llama Nemotron Super surpasses the 70B Llama 70B model distilled by DeepSeek R1 in all indicators. However, considering the recent frequent release of small parameter high energy models such as Qwen QwQ 32B model, it is estimated that Llama Nemotron Super will not be able to stand out among these models that can compete with R1.

The most important thing is that this model proves that DeepSeek may know how to tune the GPU during training better than NVIDIA.

3) The new model is just the appetizer of NVIDIA AI Agent ecosystem, NVIDIA AIQ is the main course

Why did NVIDIA develop an inference model? This is mainly to prepare for the next AI explosion point that Huang sees - AI Agent. Since OpenAI, Claude and other big companies have gradually established the foundation of Agent through DeepReasearch and MCP, NVIDIA obviously also believes that the Agent era has arrived.

The NVIDA AIQ project is NVIDIA's attempt. It directly provides a ready-made AI Agent workflow for planners with the Llama Nemotron reasoning model as the core. This project belongs to NVIDIA's Blueprint level, which refers to a set of pre-configured reference workflows and templates to help developers integrate NVIDIA's technology and libraries more easily. AIQ is the Agent template provided by NVIDIA.

picture

NVIDA AIQ Architecture

Like Manus , it integrates external tools such as web search engines and other professional AI agents, which allows the agent itself to search and use various tools. It completes the user's tasks through the planning, reflection and optimization of the Llama Nemotron reasoning model. In addition, it also supports the construction of multi-agent workflow architecture.

picture

The servicenow system based on this template

Going one step further than Manus, it has a complex RAG system for enterprise files, which includes a series of steps including extraction, embedding, vector storage, rearrangement and final processing through LLM, ensuring that enterprise data can be used by Agents.

On top of that, NVIDIA also launched an AI data platform, connecting AI reasoning models to enterprise data systems to form a DeepReasearch for enterprise data. This is a major evolution of storage technology, making storage systems no longer just data warehouses, but intelligent platforms with active reasoning and analysis capabilities.

picture

AI Data Platform Structure

In addition, AIQ places great emphasis on observability and transparency mechanisms. This is very important for security and subsequent improvements. The development team can monitor the activities of the Agent in real time and continuously optimize the system based on performance data.

In general, NVIDA AIQ is a standard agent workflow template that provides various agent capabilities. It is a more fool-proof Dify-like agent building software that has evolved into the reasoning era.

The basic model of humanoid robots is released. Nvidia aims to create a fully closed loop of embodied ecology

1) Cosmos, enabling embodied intelligence to understand the world

If we focus on Agent or invest in the present, then Nvidia's layout in embodied intelligence can be regarded as integrating the future.

NVIDIA has arranged all three elements of the model: model, data, and computing power.

Let’s start with the model. This GTC released an upgraded version of the embodied intelligence basic model Cosmos announced in January this year.

Cosmos is a model that can predict future images from current images. It can generate detailed videos from text/image input data and predict the evolution of the scene by combining its current state (image/video) with actions (prompts/control signals). Because this requires an understanding of the physical causal laws of the world, NVIDIA calls Cosmos a world fundamental model (WFM).

picture

The basic architecture of Cosmos

For embodied intelligence, predicting the impact of machine behavior on the external world is the most core capability. Only in this way can the model plan behavior based on predictions, so the world model becomes the basic model of embodied intelligence. With this basic behavior/time-physical world change prediction model, through fine-tuning of specific data sets such as autonomous driving and robotic tasks, this model can meet the actual implementation needs of various embodied intelligence with physical forms.

The entire model consists of three capabilities. The first part, Cosmos Transfer, converts structured video text input into controllable realistic video output, generating large-scale synthetic data out of thin air with text. This solves the biggest bottleneck of embodied intelligence at present - the problem of insufficient data. Moreover, this generation is a "controllable" generation, which means that users can specify specific parameters (such as weather conditions, object properties, etc.), and the model will adjust the generation results accordingly, making the data generation process more controllable and targeted. The entire process can also be combined with Ominiverse and Cosmos.

picture

Cosmos is a reality simulation built on the Ominiverse

The second part, Cosmos Predict, can generate virtual world states from multimodal inputs, supporting multi-frame generation and motion trajectory prediction. This means that given the start and end states, the model can generate reasonable intermediate processes. This is the core physical world cognition and construction capability.

The third part is Cosmos Reason, which is an open and fully customizable model with spatial and temporal perception, which can understand video data and predict interaction results through thought chain reasoning. This is an improvement in the ability to plan behaviors and predict behavioral outcomes.

With the gradual addition of these three capabilities, Cosmos can achieve a complete behavioral chain from real image token + text command prompt token input to machine action token output.

This basic model should indeed be effective. Only two months after its launch, three leading companies, 1X, Agility Robotics, and Figure AI, have started using it. Although the large language model is not leading, NVIDIA is indeed in the first echelon of embodied intelligence.

2) Isaac GR00T N1, the world's first humanoid robot basic model

With Cosmos, NVIDIA naturally used this framework to fine-tune and train the basic model Isaac GR00T N1 specifically for humanoid robots.

picture

Isaac GR00T N1 dual system architecture

It adopts a dual-system architecture, with a fast-response "System 1" and a deep-reasoning "System 2". Its comprehensive fine-tuning enables it to handle general tasks such as grasping, moving, and dual-arm manipulation. It can also be fully customized according to specific robots, and robot developers can use real or synthetic data for post-training. This allows this model to be deployed in a variety of robots of different shapes.

For example, Nvidia, Google DeepMind and Disney worked together to develop the Newton physics engine, and used the Isaac GR00T N1 as a base to drive a very unusual little Disney BDX robot. This shows how versatile it is. As a physics engine, Newton is very detailed, so it is enough to build a physical reward system to train embodied intelligence in a virtual environment.

picture

Huang Renxun and BDX robot "passionately" interact on stage

4) Data generation, two-pronged approach

NVIDIA combined NVIDIA Omniverse and the NVIDIA Cosmos Transfer world base model mentioned above to create the Isaac GR00T Blueprint. It can generate a large amount of synthetic motion data from a small number of human demonstrations for robot operation training. Using the first components of Blueprint, NVIDIA generated 780,000 synthetic trajectories in just 11 hours, equivalent to 6,500 hours (about 9 months) of human demonstration data. A considerable part of the data for Isaac GR00T N1 comes from this, which makes the performance of GR00T N1 40% higher than using only real data.

picture

Twin Simulation System

For each model, NVIDIA can provide a large amount of high-quality data through Omniverse, a purely virtual system, and Cosmos Transfer, a real-world image generation system. NVIDIA also covers the second aspect of this model.

3) Trinity computing system to build a robot computing empire from training to end

Since last year, Huang has emphasized the concept of "three computers" at GTC: one is DGX, which is a large GPU server used to train AI, including embodied intelligence. Another is AGX, an embedded computing platform designed by NVIDIA for edge computing and autonomous systems, which is used to deploy AI on the end side, such as as a core chip for autonomous driving or robots. The third is the data generation computer Omniverse+Cosmos.

picture

Three major computing systems of embodied intelligence

Huang mentioned this system again at this GTC, and specifically mentioned that with this computing system, billions of robots can be created. From training to deployment, computing power is all from NVIDIA. This part is also closed-loop.

Conclusion

If we simply compare it with the previous generation Blackwell chip, the hardware of Blackwell Ultra does not match the previous adjectives such as "nuclear bomb" and "king bomb", and it even feels like squeezing toothpaste.

But from the perspective of roadmap planning, all of these are part of Huang Renxun's layout. Next year and the year after, the Rubin architecture will see significant improvements in chip technology, transistors, rack integration, GPU interconnection, and cabinet interconnection. As the Chinese say, "the best is yet to come."

Compared with the empty promises on the hardware level, NVIDIA has made rapid progress on the software level in the past two years.

Looking at NVIDIA's entire software ecosystem, the three-level services of Meno, Nim, and Blueprint include full-stack solutions from model optimization and model encapsulation to application construction. The ecological niches of cloud service companies and NVIDIA AI all overlap. With the newly added Agent, AI infra, NVIDIA wants to take over all parts except the basic model.

When it comes to software, Huang's appetite is as big as Nvidia's stock price.

In the robotics market, Nvidia has even greater ambitions. It has the models, data, and computing power in its hands. Although it has not caught up with the top spot in basic language models, it has made up for it in basic embodied intelligence. In the shadows, a monopoly giant in embodied intelligence has emerged on the horizon.

Each link and each product here corresponds to a potential market of hundreds of billions. Huang Renxun, the lucky gambler who made a desperate bet in his early years, started a bigger gamble with the money he got from the GPU monopoly.

If Nvidia wins in either the software or robotics market in this gamble, then it will be the Google of the AI ​​era, the top monopolist in the food chain.

But looking at Nvidia's GPU profit margins, we still hope that such a future will not come.

Fortunately, for Lao Huang, this is a big gamble he has never made in his life, and the outcome is unpredictable.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
1
Comments