Arrangement: New
At the CES 2025 opening this morning, NVIDIA founder and CEO Jensen Huang delivered a landmark keynote speech, revealing the future of AI and computing. From the core Token concept of generative AI, to the launch of the brand-new Blackwell architecture GPU, to the AI-driven digital future, this speech will profoundly impact the entire industry from a cross-domain perspective.
1) From Generative AI to Agentic AI: The Dawn of a New Era
The Birth of Token: As the core driver of generative AI, Token transforms words into knowledge, infuses images with life, and opens up a new digital expression.
The Evolution of AI: From Perceptual AI and Generative AI to Agentic AI capable of reasoning, planning, and acting, AI technology continues to reach new heights.
The Transformer Revolution: Since its introduction in 2018, this technology has redefined the way of computing and completely disrupted the traditional technology stack.
2) Blackwell GPU: Breakthrough Performance Limits
The New GeForce RTX 50 Series: Based on the Blackwell architecture, it has 92 billion transistors, 4000 TOPS AI performance, and 4 PetaFLOPS of AI compute, three times the performance of the previous generation.
The Fusion of AI and Graphics: For the first time, programmable shaders and neural networks are combined, introducing neural texture compression and material shading technologies, delivering stunning rendering effects.
Democratizing High Performance: The RTX 5070 laptop achieves RTX 4090 performance at $1299, driving the widespread adoption of high-performance computing.
3) Multi-Domain Expansion of AI Applications
Enterprise-Grade AI Agents: NVIDIA provides tools like Nemo and Llama Nemotron to help enterprises build autonomous reasoning digital employees, enabling intelligent management and services.
Physics AI: Through the Omniverse and Cosmos platforms, AI is integrated into industrial, autonomous driving, and robotics domains, redefining global manufacturing and logistics.
Future Computing Scenarios: NVIDIA is bringing AI from the cloud to personal devices and enterprise environments, covering all computing needs from developers to general users.
The following are the main contents of Jensen Huang's keynote speech:
This is the birthplace of intelligence, a new kind of factory - a generator of Token. It is the building block of AI, opening up a new domain and taking the first step into an extraordinary world. Token transforms words into knowledge and infuses images with life; they turn creativity into videos and help us navigate any environment safely; they teach robots to move like masters and inspire us to celebrate victories in new ways. When we need it most, Token also brings inner peace. They imbue digital with meaning, helping us better understand the world, predict potential dangers, and find ways to address internal threats. It can make our visions come true and restore what we have lost.
All of this AI began in 1993 when NVIDIA launched its first product, the NV1. We wanted to create computers that could do things ordinary PCs couldn't, making it possible to have a game console inside a PC. Then in 1999, NVIDIA invented programmable GPUs, kicking off over 20 years of technological progress that made modern computer graphics possible. Six years later, we introduced CUDA, expressing the programmability of GPUs through a rich algorithmic expression. This technology was initially hard to explain, but by 2012, the success of AlexNet validated CUDA's potential and drove the breakthrough development of AI.
Since then, AI has been advancing at an astonishing pace. From Perceptual AI to Generative AI, and then to Agentic AI capable of perception, reasoning, planning, and action, AI's capabilities have been constantly improving. In 2018, Google introduced Transformer, and the world of AI truly took off. Transformer not only fundamentally changed the landscape of AI, but also redefined the entire computing field. We realized that machine learning is not just a new application or business opportunity, but a fundamental revolution in the way of computing. From manually writing instructions to optimizing neural networks with machine learning, every layer of the technology stack has undergone a massive change.
Today, AI applications are ubiquitous. Whether it's understanding text, images, or sound, or translating amino acids and physics, it can accomplish it all. Almost all AI applications can be boiled down to three questions: What modality of information did it learn? What modality did it translate to? What modality did it generate? This fundamental concept drives every AI-powered application.
All these achievements are inseparable from the support of GeForce. GeForce has brought AI to the masses, and now AI is returning to GeForce. With real-time ray tracing technology, we can render graphics with stunning effects. Through DLSS, AI can even surpass frame generation, predicting future frames. Only 2 million out of 33 million pixels are computed, with the rest generated by AI prediction. This miraculous technology showcases the immense capabilities of AI, making computing more efficient and revealing endless possibilities for the future.
This is why so many amazing things are happening now. We used GeForce to drive the development of AI, and now AI is completely transforming GeForce. Today, we are announcing the next-generation product - the RTX Blackwell family. Let's take a look.
This is the new GeForce RTX 50 series, based on the Blackwell architecture. This GPU is a performance monster, with 92 billion transistors, 4000 TOPS of AI performance, and 4 PetaFLOPS of AI compute, three times the performance of the previous Ada architecture. All of this is to generate the stunning pixels I just showed you. It also has 380 ray tracing Teraflops to provide the most beautiful pixels possible for the pixels that need to be computed, and 125 shader Teraflops. This card uses Micron's G7 memory, with speeds up to 1.8TB per second, double the performance of the previous generation.
We can now combine AI workloads with computer graphics workloads, and one remarkable feature of this generation is that the programmable shaders can also handle neural networks. This has enabled us to invent neural texture compression and neural material shading. These technologies use AI to learn textures and compression algorithms, ultimately generating imagery effects that only AI can achieve.
Even in the mechanical design, this card is a marvel. It uses a dual-fan design, with the entire card acting like a giant fan, and the voltage regulation modules are state-of-the-art. This exceptional design is entirely due to the efforts of the engineering team.
Next, let's look at the performance comparisons. The familiar RTX 4090, priced at $1599, is the core investment for a home PC entertainment center. Now, the RTX 50 series offers even higher performance, starting at just $549, from the RTX 5070 to the RTX 5090, delivering twice the performance of the RTX 4090.
Even more impressive is that we've put this high-performance GPU into laptops. The RTX 5070 laptop is priced at $1299 but has the performance of the RTX 4090. This design combines AI and computer graphics technology to achieve high efficiency and high performance.
The future of computer graphics will be neural rendering - the fusion of AI and computer graphics. The Blackwell series can even achieve this in laptops as thin as 14.9mm, with the full range of products from the RTX 5070 to the RTX 5090 suitable for ultra-thin laptops.
GeForce has driven the adoption of AI, and now AI is completely transforming GeForce. This is the mutual promotion of technology and intelligence, and we are moving towards a higher realm.
The Three Scaling Laws of AI
Next, let's talk about the direction of AI development.
1) Pre-training Scaling Law
The AI industry is accelerating its expansion, driven by a powerful model known as the "Scaling Law". This empirical rule, repeatedly verified by researchers and industry, states that the larger the training data, the larger the model, and the more computational power invested, the stronger the model's capabilities will be.
The growth rate of data is accelerating exponentially. It is estimated that in the coming years, the annual data production by humans will exceed the total data produced throughout human history. This data is becoming more multimodal, including forms like video, images, and audio. This massive data can be used to train the AI's foundational knowledge base, providing a solid knowledge foundation for AI.
2) Post-training Scaling Law
In addition, two other Scaling Laws are emerging.
The second Scaling Law is the "Post-training Scaling Law", which involves technologies such as reinforcement learning and human feedback. In this way, AI generates answers based on human queries and continuously improves from human feedback. This reinforcement learning system, through high-quality prompts, helps AI improve its skills in specific areas, such as being better at solving math problems or performing complex reasoning.
The future of AI is not just about perception and generation, but a process of constant self-improvement and boundary breaking. It's like having a tutor or coach who provides feedback after you complete a task. Through testing, feedback, and self-improvement, AI can also progress through similar reinforcement learning and feedback mechanisms. This post-training stage of reinforcement learning, combined with synthetic data generation technology, is similar to a self-practice process. AI can face complex and verifiable problems, such as proving theorems or solving geometry problems, and continuously optimize its answers through reinforcement learning. Although this post-training requires massive computing power, it can ultimately create extraordinary models.
3) Test-time Scaling Law
The Test-time Scaling Law is also gradually emerging. This law reveals the unique potential of AI when it is actually used. AI can dynamically allocate resources during reasoning, no longer limited to parameter optimization, but focusing on computational allocation to generate the required high-quality answers.
This process is similar to reasoning and thinking, rather than direct inference or one-time response. AI can break down the problem into multiple steps, generate multiple solutions and evaluate them, and finally choose the optimal solution. This long-term reasoning has a significant effect on improving model capabilities.
We have seen the evolution of this technology, from ChatGPT to GPT-4, and now to Gemini Pro, all of these systems are undergoing a gradual development of pre-training, post-training, and test-time scaling. Achieving these breakthroughs requires massive computing power, which is the core value of NVIDIA's Blackwell architecture.
Latest Introduction to Blackwell Architecture
The Blackwell system is in full production, with impressive performance. Today, every cloud service provider is deploying these systems, which are manufactured in 45 factories worldwide, supporting up to 200 configurations, including liquid cooling, air cooling, x86 architecture, and NVIDIA Grace CPU versions.
The core component, the NVLink system, weighs 1.5 tons, has 600,000 parts, and is as complex as 20 cars, connected by 2 miles of copper wire and 5,000 cables. The entire manufacturing process is extremely complex, but the goal is to meet the ever-increasing demand for computing power.
Compared to the previous generation architecture, Blackwell has a 4-fold improvement in performance per watt and a 3-fold improvement in performance per dollar. This means that, at the same cost, the scale of model training can be increased by 3 times, and the key to these improvements is the generation of AI Tokens. These Tokens are widely used in ChatGPT, Gemini, and various AI services, and are the foundation of future computing.
Building on this, NVIDIA has driven a new computing paradigm: neural rendering, seamlessly integrating AI and computer graphics. The 72 GPUs in the Blackwell architecture form the world's largest single-chip system, providing up to 1.4 ExaFLOPS of AI floating-point performance, with a memory bandwidth of an astonishing 1.2 PB/s, equivalent to the total global internet traffic. This super-computing power allows AI to handle more complex reasoning tasks, while significantly reducing costs, laying the foundation for more efficient computing.
AI Agent System and Ecosystem
Looking to the future, the AI reasoning process will no longer be a simple single-step response, but more akin to an "internal dialogue". Future AI will not only generate answers, but also reflect, reason, and continuously optimize. As the rate of AI Token generation increases and the cost decreases, the service quality of AI will significantly improve, meeting a wider range of application needs.
To help enterprises build AI systems with autonomous reasoning capabilities, NVIDIA provides three key tools: NVIDIA NeMo, AI Microservices, and Acceleration Libraries. By packaging complex CUDA software and deep learning models into containerized services, enterprises can deploy these AI models on any cloud platform, quickly developing domain-specific AI Agents, such as service tools to support enterprise management or digital employees for user interaction.
These models open up new possibilities for enterprises, not only lowering the development threshold for AI applications, but also driving the entire industry to take a firm step towards Agentic AI (autonomous AI). In the future, AI will become digital employees that can be easily integrated into enterprise tools like SAP and ServiceNow, providing intelligent services in different environments. This is the next milestone in the expansion of AI, and the core vision of NVIDIA's technology ecosystem.
Training Evaluation System. In the future, these AI Agents are essentially digital labor working alongside employees to complete tasks for you. Therefore, introducing these specialized Agents to your company is like onboarding new employees. We provide various tool libraries to help these AI Agents learn the unique language, vocabulary, business processes, and work styles of your company. You need to provide examples of work outputs, and they will try to generate them, then you can provide feedback and evaluation, etc. At the same time, you will also set restrictions, such as clearly defining what operations they cannot perform and what they cannot say, and control the information they can access. This entire digital employee process is called Nemo. To some extent, each company's IT department will become the HR department for AI Agents.
Today, the IT department manages and maintains a large number of software; in the future, they will manage, cultivate, onboard, and improve a large number of digital Agents to provide services for the company. Therefore, the IT department will gradually evolve into the HR department for AI Agents.
In addition, we provide many open-source blueprints for the ecosystem to use. Users can freely modify these blueprints. We have provided blueprints for various types of Agents. Today, we also announce something very cool and smart: we are launching a brand-new model family based on Llama, the NVIDIA Llama Nemo Tron Language Foundation Model series.
Llama 3.1 is a phenomenal model. Meta's Llama 3.1 has been downloaded about 350,650,000 times and has spawned about 60,000 other models. This is one of the core reasons why almost all enterprises and industries have started to research AI. We recognize that the Llama model can be better fine-tuned for enterprise use cases. Leveraging our expertise and capabilities, we have fine-tuned it into the Llama Nemotron Open Model Suite.
These models come in different sizes: small models respond quickly; the mainstream Super Llama Nemotron is a general-purpose model; and the ultra-large Ultra Model can serve as a teacher model, used to evaluate other models, generate answers and determine their quality, or used as a knowledge distillation model. All of these models are now online.
These models perform excellently, ranking high in leaderboards for tasks like dialogue, instructions, and information retrieval, and are well-suited for AI Agent functionality globally.
Our collaboration with the ecosystem is also very close, such as our cooperation with ServiceNow, SAP, and Siemens in industrial AI. Companies like Cadence and Perplexity are also doing excellent projects. Perplexity has disrupted the search field, and Codium serves 30 million software engineers globally. AI assistants will greatly improve the productivity of software developers, which is the next huge application area for AI services. There are 1 billion knowledge workers globally, and AI Agents could be the next robot industry, with a potential of trillions of dollars.
AI Agent Blueprints
Next, let's show some AI Agent blueprints that we have completed together with partners.
AI Agents are the new digital labor that can assist or replace humans in completing tasks. NVIDIA's Agentic AI building blocks, NEM pre-trained models, and Nemo framework help organizations easily develop and deploy AI Agents. These Agents can be trained as domain-specific task experts.
Here are four examples:
Research Assistant Agent: Able to read complex documents, such as lectures, journals, financial reports, and generate interactive podcasts for easy learning;
Software Security AI Agent: Helps developers continuously scan for software vulnerabilities and prompts them to take appropriate measures;
Virtual Lab AI Agent: Accelerates compound design and screening, quickly finding potential drug candidates;
Video Analytics AI Agent: Based on the NVIDIA Metropolis blueprint, it analyzes data from billions of cameras, generating interactive search, summarization, and reporting. For example, it monitors traffic flow and facility processes, providing improvement recommendations;
The Dawn of the Physical AI Era
We aim to bring AI from the cloud to every corner, including within companies and personal PCs. NVIDIA is working to transform Windows WSL 2 (Windows Subsystem) into the preferred platform for AI support. This will enable developers and engineers to more conveniently leverage NVIDIA's AI technology stack, including language models, image models, animation models, and more.
Additionally, NVIDIA has launched Cosmos, the first physical world foundation model development platform, focused on understanding the dynamic properties of the physical world, such as gravity, friction, inertia, spatial relationships, and causality. It can generate videos and scenes that comply with physical laws, with wide applications in robot training, industrial AI, and multimodal language model training and validation.
Cosmos provides physical simulation through integration with the NVIDIA Omniverse, generating realistic and credible simulation results. This combination is the core technology for robot and industrial application development.
NVIDIA's industrial strategy is based on three computing systems:
DGX systems for training AI;
AGX systems for deploying AI;
Digital twin systems for reinforcement learning and AI optimization;
Through the synergistic work of these three systems, NVIDIA is driving the development of robotics and industrial AI, building the future digital world. It's not a three-body problem, but a "three-computer" solution.
Let me show you three examples of NVIDIA's robot vision.
1) Industrial Visualization Applications
Currently, there are millions of factories and tens of thousands of warehouses worldwide, forming the backbone of a $50 trillion manufacturing industry. In the future, all of this will need to be software-defined, automated, and integrated with robot technology. We are collaborating with Keon, a leading global warehouse automation solution provider, and Accenture, the world's largest professional services firm, to focus on digital manufacturing and create some very special solutions. Our go-to-market approach is similar to other software and technology platforms, through developer and ecosystem partner collaborations, and more and more ecosystem partners are joining the Omniverse platform. This is because everyone wants to visualize the future of industry. In this $50 trillion global GDP, there is so much waste and so much automation opportunity.
Let's look at this example of Keon and Accenture collaborating with us:
Keon (a supply chain solutions company), Accenture (a global professional services leader), and NVIDIA are bringing Physical AI to the trillion-dollar warehouse and distribution center market. Efficiently managing warehouse logistics requires navigating a complex decision network, influenced by constantly changing variables such as daily and seasonal demand fluctuations, space constraints, labor supply, and the integration of diverse robots and automation systems. Today, predicting the key performance indicators (KPIs) of physical warehouses is nearly impossible.
To address these challenges, Keon is adopting Mega (an NVIDIA Omniverse blueprint) to build an industrial digital twin, to test and optimize their robot fleets. First, Keon's warehouse management solution assigns tasks to the industrial AI brain in the digital twin, such as moving goods from buffer locations to shuttle storage solutions. The robot fleet in the physical warehouse simulation environment of Omniverse perceives, reasons, plans the next actions, and takes actions. The digital twin environment uses sensor simulation, allowing the robot brain to see the state after task execution and decide the next steps. Under the precise tracking of Mega, the entire loop continues, measuring operational KPIs like throughput, efficiency, and utilization, all before making changes to the physical warehouse.
With NVIDIA's collaboration, Keon and Accenture are redefining the future of industrial autonomy.
In the future, every factory will have a digital twin that is fully synchronized with the actual factory. You can use Omniverse and Cosmos to generate a multitude of future scenarios, and AI will determine the optimal KPI scenario, which will serve as constraints and AI programming logic for actual factory deployment.
2) Autonomous Vehicles
The autonomous driving revolution is here. After years of development, the success of Waymo and Tesla has proven the maturity of autonomous driving technology. Our solutions provide the industry with three computing systems: a system for training AI (such as the DGX system), a system for simulation testing and synthetic data generation (such as Omniverse and Cosmos), and an in-vehicle computing system (such as the AGX system). Almost all major automakers globally are collaborating with us, including Waymo, Zoox, Tesla, and the world's largest electric vehicle company, BYD. There are also companies like Mercedes, Lucid, Rivian, Xiaomi, and Volvo, which are about to launch innovative vehicle models. Aurora is using NVIDIA technology to develop autonomous trucks.
There are 100 million vehicles manufactured each year, and 1 billion vehicles on the roads globally, driving a total of trillions of miles annually. These will gradually become highly automated or fully autonomous. This industry is expected to become the first trillion-dollar robot industry.
Today, we are announcing the launch of our next-generation in-vehicle computer, Thor. It is a universal robot computer capable of handling the massive data from cameras, high-resolution radars, and LiDARs. Thor is an upgrade to the current industry standard Orin, with 20 times the computing power, and is now in full-scale production. Additionally, NVIDIA's Drive OS is the first AI computing operating system certified to the highest functional safety standard (ISO 26262 ASIL D).
Autonomous Driving Data Factory
NVIDIA leverages Omniverse AI models and the Cosmos platform to create an autonomous driving data factory, significantly expanding training data through synthetic driving scenarios. This includes:
OmniMap: Fusing map and geospatial data to build drivable 3D environments;
Neural Reconstruction Engine: Using sensor logs to generate high-fidelity 4D simulation environments and generate scenario variants for training data;
Edify 3DS: Searching asset libraries or generating new assets to create scenarios for simulation.
Through these technologies, we are expanding thousands of driving scenarios into billions of miles of data for the development of safer and more advanced autonomous driving systems.
3) General Robotics
The era of general robotics is upon us. The key to driving breakthroughs in this field is training. For humanoid robots, acquiring imitation data is relatively challenging, but NVIDIA's Isaac Groot provides a solution. It generates massive datasets through simulation, and combines the multiverse simulation engines of Omniverse and Cosmos for policy training, validation, and deployment.
For example, developers can remotely operate robots using Apple Vision Pro, capturing data without physical robots, and teaching task actions in a risk-free environment. Through Omniverse's domain randomization and 3D-to-real-world extension capabilities, exponentially growing datasets are generated, providing abundant resources for robot learning.
In summary, whether it's industrial visualization, autonomous driving, or general robotics, NVIDIA's technology is leading the future transformation of the Physical AI and robotics domains.
Finally, I have one more important thing to share - all of this is built upon a project we started within the company ten years ago, called Project Digits, the full name being the Deep Learning GPU Intelligence Training System, or Digits for short.
Before the official launch, we adjusted DGX to be harmonious with the company's internal RTX, AGX, OVX, and other product lines. The debut of DGX1 truly changed the course of AI development, and this was a milestone for NVIDIA's contribution to the AI evolution.
The Revolutionary DGX1
The original intent of DGX1 was to provide an out-of-the-box AI supercomputer for researchers and startups. Imagine, in the past, supercomputers required users to build dedicated facilities, design and construct complex infrastructure, just to make them work. But DGX1 was a supercomputer specifically designed for AI development, ready to use right out of the box.
I still remember in 2016 when I delivered the first DGX1 to a startup called OpenAI. Elon Musk, Ilya Sutskever, and many NVIDIA engineers were there, and we celebrated the arrival of DGX1 together. This device significantly drove the advancement of AI computing.
Nowadays, AI is ubiquitous. Not only in research institutions and startup laboratories, as I mentioned earlier, AI has become a new way of computing and software development. Every software engineer, creative artist, and even ordinary computer users need an AI supercomputer. But I've always hoped that the DGX1 could be a little smaller.
The Latest AI Supercomputer
Here is NVIDIA's latest AI supercomputer. It still belongs to Project Digits, and we are still looking for a better name, so feel free to provide suggestions. This is a truly amazing device.
This supercomputer can run NVIDIA's full AI software stack, including DGX Cloud. It can be used as a cloud-based supercomputer, a high-performance workstation, or even a desktop-based analytics workstation. Most importantly, it is based on a new chip we have secretly developed, codenamed GB110, which is the smallest Grace Blackwell we have manufactured.
I have a chip here to show you its internal design. This chip was co-developed with the global leading SoC company MediaTek. The CPU SoC is customized for NVIDIA, using NVLink chip-to-chip interconnect technology to connect to the Blackwell GPU. This small chip is now in full production. We expect this supercomputer to be officially launched around May.
We even offer a "double-power" configuration, allowing these devices to be connected through ConnectX, supporting GPUDirect technology. It is a complete supercomputing solution that can meet the needs of AI development, analytics, and industrial applications.
In addition, we announced the mass production of three new Blackwell system chips, the world's first physical AI foundation model, and three breakthroughs in the field of robotics - autonomous AI agent robots, humanoid robots, and self-driving cars.