Huang Renxun's latest CES speech: Launch of three mass-produced Blackwell chips; AI Agent has a potential of trillions of dollars

This article is machine translated
Show original
Here is the English translation of the text, with the specified terms retained and not translated:

At the opening of CES 2025 this morning, NVIDIA founder and CEO Jensen Huang delivered a landmark keynote speech, revealing the future of AI and computing. From the core Token concept of generative AI, to the launch of the brand-new Blackwell architecture GPU, to the AI-driven digital future, this speech will profoundly impact the entire industry from a cross-domain perspective.

1) From Generative AI to Agentic AI: The Dawn of a New Era

· The Birth of Token: As the core driving force of generative AI, Token transforms words into knowledge and infuses images with life, opening up a new digital expression.

· The Evolution Path of AI: From Perceptual AI, Generative AI to Agentic AI capable of reasoning, planning, and acting, AI technology continues to reach new heights.

· The Transformer Revolution: Since its introduction in 2018, this technology has redefined the way of computing and completely disrupted the traditional technology stack.

2) Blackwell GPU: Breakthrough Performance Limits

· The New GeForce RTX 50 Series: Based on the Blackwell architecture, it has 92 billion transistors, 4000 TOPS of AI performance, and 4 PetaFLOPS of computing power, tripling the performance of the previous generation.

· The Fusion of AI and Graphics: For the first time, programmable shaders and neural networks are combined, introducing neural texture compression and material shading technologies, bringing stunning rendering effects.

· Democratizing High Performance: The RTX 5070 laptop achieves RTX 4090 performance at $1299, driving the widespread adoption of high-performance computing.

3) Multi-Domain Expansion of AI Applications

· Enterprise-Level AI Agent: NVIDIA provides tools like Nemo and Llama Nemotron to help enterprises build autonomous reasoning digital employees, achieving intelligent management and service.

· Physic AI: Through the Omniverse and Cosmos platforms, AI is integrated into the industrial, autonomous driving, and robotics fields, redefining global manufacturing and logistics.

· Future Computing Scenarios: NVIDIA is bringing AI from the cloud to personal devices and enterprises, covering all computing needs from developers to regular users.

The following are the main contents of Jensen Huang's keynote speech:

This is the birthplace of intelligence, a new kind of factory - a generator of Tokens. It is the building block of AI, opening up a new domain and taking the first step into an extraordinary world. Tokens transform words into knowledge and infuse images with life; they turn creativity into videos and help us navigate any environment safely; they teach robots to move like masters and inspire us to celebrate victories in new ways. When we need it most, Tokens also bring inner peace. They imbue the digital with meaning, helping us better understand the world, predict potential dangers, and find ways to address internal threats. They can make our visions a reality and restore what we have lost.

All of this AI began in 1993 when NVIDIA launched its first product, the NV1. We wanted to create computers that could do things ordinary PCs couldn't, making it possible to have a game console inside a PC. Then in 1999, NVIDIA invented programmable GPUs, kicking off over 20 years of technological progress that made modern computer graphics possible. Six years later, we introduced CUDA, expressing the programmability of GPUs through a rich algorithmic expression. This technology was initially hard to explain, but by 2012, the success of AlexNet validated CUDA's potential and drove the breakthrough development of AI.

Since then, AI has been advancing at an astonishing pace. From Perceptual AI to Generative AI, and then to Agentic AI capable of perception, reasoning, planning, and action, AI's capabilities have been constantly improving. In 2018, Google introduced Transformer, and the world of AI truly took off. Transformer not only fundamentally changed the landscape of AI, but also redefined the entire computing field. We realized that machine learning is not just a new application or business opportunity, but a fundamental revolution in the way of computing. From manually writing instructions to optimizing neural networks with machine learning, every layer of the technology stack has undergone a massive change.

Today, AI applications are ubiquitous. Whether it's understanding text, images, or sound, or translating amino acids and physics, it can do it all. Almost all AI applications can be boiled down to three questions: What modality of information did it learn? What modality did it translate into? What modality did it generate? This fundamental concept drives every AI-powered application.

All these achievements are inseparable from the support of GeForce. GeForce brought AI to the masses, and now AI is returning to GeForce. With real-time ray tracing technology, we can render graphics with stunning effects. Through DLSS, AI can even surpass frame generation, predicting future frames. Only 2 million out of 33 million pixels are computed, the rest are AI-predicted. This miraculous technology demonstrates the powerful capabilities of AI, making computing more efficient and revealing endless possibilities for the future.

This is why so many amazing things are happening now. We used GeForce to drive the development of AI, and now AI is completely transforming GeForce. Today, we are announcing the next-generation product - the RTX Blackwell family. Let's take a look.

This is the new GeForce RTX 50 series, based on the Blackwell architecture. This GPU is a performance monster, with 92 billion transistors, 4000 TOPS of AI performance, and 4 PetaFLOPS of AI computing power, tripling the performance of the previous Ada architecture. All of this is to generate the stunning pixels I just showed you. It also has 380 ray tracing Teraflops to provide the most beautiful possible pixels for the pixels that need to be computed, and 125 shading Teraflops. This card uses Micron's G7 memory, with a speed of 1.8TB per second, double the performance of the previous generation.

We can now combine AI workloads with computer graphics workloads, and a remarkable feature of this generation is that the programmable shaders can also handle neural networks. This has allowed us to invent neural texture compression and neural material shading. These technologies use AI to learn textures and compression algorithms, ultimately generating the stunning image effects that only AI can achieve.

Even in terms of mechanical design, this card is a marvel. It uses a dual-fan design, with the entire card acting like a giant fan, and the internal voltage regulation modules are state-of-the-art. This exceptional design is entirely due to the efforts of the engineering team.

Next, let's look at the performance comparison. The familiar RTX 4090, priced at $1599, is the core investment for a home PC entertainment center. Now, the RTX 50 series offers even higher performance, starting at just $549, from the RTX 5070 to the RTX 5090, with twice the performance of the RTX 4090.

Even more impressive is that we've put this high-performance GPU into laptops. The RTX 5070 laptop is priced at $1299 but has the performance of the RTX 4090. This design combines AI and computer graphics technology to achieve high efficiency and high performance.

The future of computer graphics will be neural rendering - the fusion of AI and computer graphics. The Blackwell series can even achieve this in laptops as thin as 14.9mm, with the full range of products from the RTX 5070 to the RTX 5090 suitable for ultra-thin laptops.

GeForce has driven the popularization of AI, and now AI is completely transforming GeForce. This is the mutual promotion of technology and intelligence, and we are moving towards a higher realm.

The Three Scaling Laws of AI

Next, let's talk about the direction of AI development.

1) Pre-training Scaling Law

The AI industry is accelerating its expansion, driven by a powerful model called the "Scaling Law". This empirical rule, repeatedly verified by researchers and industry, shows that the larger the training data, the larger the model scale, and the more computing power invested, the stronger the model's capabilities will be.

The growth rate of data is accelerating exponentially. It is estimated that in the coming years, the amount of data produced by humans annually will exceed the total amount produced throughout human history. This data is becoming multimodal, including forms such as video, images, and audio. This massive data can be used to train the fundamental knowledge system of AI, providing a solid knowledge foundation for AI.

2) Post-Training Scaling Law

In addition, two other Scaling Laws are emerging.

The second Scaling Law is the "Post-Training Scaling Law", which involves technologies such as reinforcement learning and human feedback. In this way, AI generates answers based on human queries and continuously improves from human feedback. This reinforcement learning system, through high-quality prompts, helps AI to refine its skills in specific areas, such as being better at solving math problems or performing complex reasoning.

The future of AI is not just about perception and generation, but a process of constant self-improvement and boundary-breaking. It's like having a tutor or coach who provides feedback after you complete a task. Through testing, feedback, and self-improvement, AI can also progress through similar reinforcement learning and feedback mechanisms. This post-training stage of reinforcement learning, combined with synthetic data generation technology, is similar to a self-practice process. AI can face complex and verifiable problems, such as proving theorems or solving geometry problems, and continuously optimize its answers through reinforcement learning. Although this post-training requires massive computing power, it can ultimately create extraordinary models.

3) Inference Time Scaling Law

The Inference Time Scaling Law is also gradually emerging. This law exhibits unique potential when AI is actually used. AI can dynamically allocate resources during reasoning, no longer limited to parameter optimization, but focusing on computational allocation to generate the required high-quality answers.

This process is similar to reasoning and thinking, rather than direct inference or one-time response. AI can break down problems into multiple steps, generate multiple solutions, and evaluate them to choose the optimal solution. This long-term reasoning has a significant effect on improving model capabilities.

We have seen the evolution of this technology, from ChatGPT to GPT-4, and now to Gemini Pro, all of these systems are undergoing a gradual development of pre-training, post-training, and inference time expansion. Achieving these breakthroughs requires massive computing power, which is the core value of NVIDIA's Blackwell architecture.

Latest Introduction to Blackwell Architecture

The Blackwell system is in full production, and its performance is impressive. Today, every cloud service provider is deploying these systems, which are manufactured in 45 factories worldwide, supporting up to 200 configurations, including liquid cooling, air cooling, x86 architecture, and NVIDIA Grace CPU versions.

The core component, the NVLink system, weighs 1.5 tons and has 600,000 parts, equivalent to the complexity of 20 cars, connected by 2 miles of copper wire and 5,000 cables. The entire manufacturing process is extremely complex, but the goal is to meet the ever-increasing demand for computing power.

Compared to the previous generation architecture, Blackwell has a 4-fold improvement in performance per watt and a 3-fold improvement in performance per dollar. This means that, at the same cost, the scale of models that can be trained can be increased by 3 times, and the key to these improvements is the generation of AI tokens. These tokens are widely used in ChatGPT, Gemini, and various AI services, and are the foundation of future computing.

Building on this, NVIDIA has driven a new computing paradigm: neural rendering, seamlessly integrating AI and computer graphics. The 72 GPUs in the Blackwell architecture form the world's largest single-chip system, providing up to 1.4 ExaFLOPS of AI floating-point performance, with a memory bandwidth of an astonishing 1.2 PB/s, equivalent to the total global internet traffic. This super-computing power allows AI to handle more complex reasoning tasks while significantly reducing costs, laying the foundation for more efficient computing.

AI Agent System and Ecosystem

Looking to the future, the AI reasoning process will no longer be a simple single-step response, but more akin to an "internal dialogue". Future AI will not only generate answers, but also reflect, reason, and continuously optimize. As the rate of AI token generation increases and the cost decreases, the service quality of AI will be significantly improved, meeting a wider range of application needs.

To help enterprises build AI systems with autonomous reasoning capabilities, NVIDIA provides three key tools: NVIDIA NeMo, AI microservices, and acceleration libraries. By packaging complex CUDA software and deep learning models into containerized services, enterprises can deploy these AI models on any cloud platform, quickly developing domain-specific AI Agents, such as service tools to support enterprise management or digital employees for user interaction.

These models open up new possibilities for enterprises, not only lowering the development threshold for AI applications, but also driving the entire industry to take a firm step towards Agentic AI (autonomous AI). In the future, AI will become digital employees that can be easily integrated into enterprise tools like SAP and ServiceNow, providing intelligent services in different environments. This is the next milestone in the expansion of AI, and the core vision of NVIDIA's technology ecosystem.

Training and evaluation system. In the future, these AI Agents are essentially digital labor working alongside employees to complete tasks for you. Therefore, introducing these specialized Agents to your company is like onboarding new employees. We provide various tool libraries to help these AI Agents learn the unique language, vocabulary, business processes, and work styles of your company. You need to provide examples of work outputs, and they will try to generate them, then you can provide feedback and evaluation, etc. At the same time, you will also set restrictions, such as clearly defining what operations they cannot perform and what they cannot say, and control the information they can access. This entire digital employee process is called Nemo. To some extent, each company's IT department will become the HR department for AI Agents.

Today, the IT department manages and maintains a large number of software; in the future, they will manage, cultivate, onboard, and improve a large number of digital Agents to provide services for the company. Therefore, the IT department will gradually evolve into the HR department for AI Agents.

In addition, we provide many open-source blueprints for the ecosystem to use. Users can freely modify these blueprints. We have provided blueprints for various types of Agents. Today, we also announce something very cool and smart: we are launching a brand-new model family based on Llama, the NVIDIA Llama Nemo Tron language foundation model series.

Llama 3.1 is a phenomenal model. Meta's Llama 3.1 has been downloaded about 350,650,000 times and has spawned about 60,000 other models. This is one of the core reasons why almost all enterprises and industries are starting to research AI. We recognize that the Llama model can be better fine-tuned for enterprise use cases. Leveraging our expertise and capabilities, we have fine-tuned it into the Llama Nemotron open model suite.

These models are divided into different sizes: small models respond quickly; the mainstream Super Llama Nemotron super-models are for general-purpose use; and the ultra-large Ultra Model can serve as a teacher model, used to evaluate other models, generate answers and determine their quality, or used as a knowledge distillation model. All of these models are now online.

These models perform excellently, ranking at the top in areas such as dialogue, instructions, and information retrieval, and are well-suited for AI Agent functionality globally.

Our collaboration with the ecosystem is also very close, such as cooperation with ServiceNow, SAP, and Siemens in industrial AI. Companies like Cadence and Perplexity are also carrying out excellent projects. Perplexity has disrupted the search field, and Codium serves 30 million software engineers worldwide. AI assistants will greatly improve the productivity of software developers, which is the next huge application area for AI services. There are 1 billion knowledge workers globally, and AI Agents could be the next robot industry, with a potential of trillions of dollars.

AI Agent Blueprints

Next, we will show some AI Agent blueprints completed in collaboration with partners.

AI Agents are the new digital labor force that can assist or replace humans in completing tasks. NVIDIA's Agentic AI building blocks, NEM pre-trained models, and Nemo framework help organizations easily develop and deploy AI Agents. These Agents can be trained as domain-specific task experts.

Here are four examples:

· Research Assistant Agent: Able to read complex documents such as lectures, journals, financial reports, and generate interactive podcasts for easy learning;

· Software Security AI Agent: Helps developers continuously scan software vulnerabilities and prompts them to take appropriate measures;

· Virtual Laboratory AI Agent: Accelerates compound design and screening, quickly finding potential drug candidates;

· Video Analysis AI Agent: Based on NVIDIA Metropolis blueprint, analyzes data from billions of cameras, generating interactive search, summarization, and reporting. For example, monitoring traffic flow, facility processes, and providing improvement suggestions.

The Dawn of Physical AI Era

We aim to bring AI from the cloud to every corner, including within companies and personal PCs. NVIDIA is working to transform Windows WSL 2 (Windows Subsystem) into the preferred platform for AI. This will make it more convenient for developers and engineers to leverage NVIDIA's AI technology stack, including language models, image models, animation models, etc.

Additionally, NVIDIA has launched Cosmos, the first physical world foundation model development platform, focusing on understanding the dynamic properties of the physical world, such as gravity, friction, inertia, spatial relationships, and causality. It can generate videos and scenes that comply with physical laws, with wide applications in robot, industrial AI, and multimodal language model training and validation.

Cosmos provides physical simulation through connecting with NVIDIA Omniverse, generating realistic and credible simulation results. This combination is the core technology for robot and industrial application development.

NVIDIA's industrial strategy is based on three computing systems:

· DGX systems for training AI;

· AGX systems for deploying AI;

· Digital twin systems for reinforcement learning and AI optimization.

Through the collaborative work of these three systems, NVIDIA is driving the development of robotics and industrial AI, building the future digital world. It's not a three-body problem, but a "three-computer" solution.

Let me show you three examples of NVIDIA's robot vision.

1) Industrial Visualization Applications

Currently, there are millions of factories and tens of thousands of warehouses worldwide, forming the backbone of a $50 trillion manufacturing industry. In the future, all of this needs to be software-defined, automated, and integrated with robot technology. We are working with Keon, a leading global warehouse automation solution provider, and Accenture, the world's largest professional services firm, to focus on digital manufacturing and create some very special solutions. Our go-to-market approach is similar to other software and technology platforms, through developer and ecosystem partnerships, and more and more ecosystem partners are joining the Omniverse platform. This is because everyone wants to visualize the future of industry. In this $50 trillion global GDP, there is so much waste and so much automation opportunity.

Let's look at an example of Keon and Accenture collaborating with us:

Keon (a supply chain solutions company), Accenture (a global professional services leader), and NVIDIA are bringing Physical AI to the trillion-dollar warehouse and distribution center market. Efficiently managing warehouse logistics requires navigating complex decision networks influenced by constantly changing variables, such as daily and seasonal demand fluctuations, space constraints, labor supply, and the integration of diverse robots and automation systems. Today, predicting the key performance indicators (KPIs) of physical warehouses is nearly impossible.

To address these challenges, Keon is adopting Mega (an NVIDIA Omniverse blueprint) to build an industrial digital twin to test and optimize their robot fleets. First, Keon's warehouse management solution assigns tasks to the industrial AI brain in the digital twin, such as moving goods from buffer locations to shuttle storage solutions. The robot fleet in the Omniverse physical warehouse simulation environment perceives, reasons, plans the next actions, and takes actions. The digital twin environment uses sensor simulation to allow the robot brain to see the task execution status and decide the next steps. Under the precise tracking of Mega, the entire loop continues, measuring operational KPIs like throughput, efficiency, and utilization, all before making changes to the physical warehouse.

With NVIDIA's collaboration, Keon and Accenture are redefining the future of industrial autonomy.

In the future, every factory will have a digital twin that is fully synchronized with the actual factory. You can use Omniverse and Cosmos to generate a multitude of future scenarios, and AI will determine the optimal KPI scenario, which will serve as the constraints and AI programming logic for the actual factory deployment.

2) Autonomous Driving

The autonomous driving revolution is here. After years of development, the success of Waymo and Tesla has proven the maturity of autonomous driving technology. Our solution provides three computing systems for this industry: a system for training AI (such as the DGX system), a system for simulation testing and synthetic data generation (such as Omniverse and Cosmos), and an in-vehicle computing system (such as the AGX system). Almost all major automakers globally are collaborating with us, including Waymo, Zoox, Tesla, and the world's largest electric vehicle company BYD. There are also companies like Mercedes, Lucid, Rivian, Xiaomi, and Volvo that are about to launch innovative vehicle models. Aurora is using NVIDIA technology to develop autonomous trucks.

There are 100 million vehicles manufactured each year, and 1 billion vehicles on the roads globally, traveling trillions of miles annually. These will gradually become highly automated or fully autonomous. This industry is expected to become the first trillion-dollar robot industry.

Today, we are announcing the launch of our next-generation in-vehicle computer, Thor. It is a universal robot computer capable of handling large amounts of data from cameras, high-resolution radars, and LiDARs. Thor is an upgrade to the current industry standard Orin, with 20 times the computing power, and is now in full-scale production. Meanwhile, NVIDIA's Drive OS is the first AI computing operating system certified to the highest functional safety standard (ISO 26262 ASIL D).

Autonomous Driving Data Factory

NVIDIA is leveraging Omniverse AI models and the Cosmos platform to create an autonomous driving data factory, significantly expanding training data through synthetic driving scenarios. This includes:

· OmniMap: Fusing map and geospatial data to build drivable 3D environments;

· Neural Reconstruction Engine: Using sensor logs to generate high-fidelity 4D simulation environments and generate scenario variants for training data;

· Edify 3DS: Searching asset libraries or generating new assets to create scenarios for simulation.

With these technologies, we are expanding thousands of driving scenarios into billions of miles of data for the development of safer and more advanced autonomous driving systems.

3) General Robotics

The era of general robotics is upon us. The key to driving breakthroughs in this field is training. For humanoid robots, acquiring imitation data is relatively difficult, but NVIDIA's Isaac Groot provides a solution. It generates massive datasets through simulation, and combines the multi-verse simulation engines of Omniverse and Cosmos for policy training, validation, and deployment.

For example, developers can remotely operate robots using Apple Vision Pro, capturing data without physical robots, and teaching task actions in a risk-free environment. Through Omniverse's domain randomization and 3D-to-real-world extension capabilities, exponentially growing datasets are generated, providing abundant resources for robot learning.

In summary, whether it's industrial visualization, autonomous driving, or general robotics, NVIDIA's technology is leading the future transformation of the Physical AI and robotics domains.

Finally, I have one more important content to share, and all of this is inseparable from a project we launched internally at the company ten years ago, called Project Digits, the full name being Deep Learning GPU Intelligence Training System, or Digits for short.

Before the official release, we adjusted DGX to be consistent with the company's internal RTX, AGX, OVX, and other product lines. The launch of DGX1 truly changed the direction of AI development, which is also a milestone for NVIDIA's contribution to the development of AI.

The Revolutionary DGX1

The original intention of DGX1 was to provide researchers and startups with a plug-and-play AI supercomputer. Imagine that in the past, supercomputers required users to build dedicated facilities, design and build complex infrastructure, in order to realize their existence. But DGX1 is a supercomputer specifically designed for AI development, requiring no complex operations, just plug and play.

I still remember that in 2016, I delivered the first DGX1 to a startup - OpenAI. At the time, Elon Musk, Ilya Sutskever, and many engineers from NVIDIA were present, and we celebrated the arrival of the DGX1 together. This device has significantly promoted the development of AI computing.

Today, AI is ubiquitous. Not only in research institutions and startup labs, as I mentioned earlier, AI has become a new way of computing and software development. Every software engineer, creative artist, and even ordinary users who use computer tools need a supercomputer for AI.

The Latest AI Supercomputer

Here is NVIDIA's latest AI supercomputer. It still belongs to Project Digits, and we are still looking for a better name, so feel free to provide suggestions. This is a truly amazing device.

This supercomputer can run NVIDIA's full AI software stack, including DGX Cloud. It can be used as a cloud-based supercomputer, a high-performance workstation, or even a desktop-based analytics workstation. Most importantly, it is based on a new chip we have secretly developed, code-named GB110, which is the smallest Grace Blackwell we have made.

I have a chip here to show you its internal design. This chip was co-developed with the global leading SoC company MediaTek. This custom CPU SoC is connected to the Blackwell GPU using NVLink chip-to-chip interconnect technology. This small chip is now in full production. We expect this supercomputer to be officially launched around May.

We even offer a "double-power" configuration, allowing these devices to be connected through ConnectX with GPUDirect technology. It is a complete supercomputing solution that can meet the needs of AI development, analytics, and industrial applications.

In addition, we announced the mass production of three new Blackwell system chips, the world's first physical AI foundation model, and breakthroughs in three robotics areas - autonomous AI agent robots, humanoid robots, and self-driving cars.

Original Link

Welcome to join the official BlockBeats community:

Telegram Subscription Group: https://t.me/theblockbeats

Telegram Discussion Group: https://t.me/BlockBeats_App

Twitter Official Account: https://twitter.com/BlockBeatsAsia

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments