NVIDIA's open-source AI model Nemotron 3 makes its debut, integrating hardware and software to support agent-based AI applications.

This article is machine translated
Show original

NVIDIA launched its Nemotron 3 series of open-source AI models on December 15th, targeting the practical deployment needs of agent-based AI. Among them, the Nemotron 3 Nano, the first model available immediately, is positioned as a key player in "high computational efficiency and low inference cost," targeting the large workloads of enterprises and developers in multi-agent AI systems. NVIDIA stated that the Nemotron 3 Nano is a core model designed to address "cost, efficiency, and scalability."

For proxy-based AI, cost and efficiency are key factors; Nvidia targets Nano as its entry point.

Nvidia points out that enterprises generally face three major problems when implementing multi-agent AI systems:

  • Communication costs between agents are rising rapidly.

  • Long-running tasks are prone to context drift.

  • The inference cost is too high to be deployed on a large scale.

Against this backdrop, the Nemotron 3 Nano is positioned as the "main model for handling high-frequency, well-defined tasks," responsible for a large number of repetitive tasks such as software debugging, content summarization, information retrieval, and AI assistant processes, so that the overall system does not have to use large, cutting-edge models for every task.

(Note: Context drift means that the longer the task is extended, the more likely the AI ​​is to go off-topic, misunderstand the key points, or even contradict itself.)

Nemotron 3 Nano specifications revealed: 30 billion parameters, but only 3 billion will be used.

In terms of technical architecture, the Nemotron 3 Nano adopts a hybrid expert architecture:

  • Total number of parameters: approximately 30 billion.

  • Single task activation parameters: up to 3 billion.

  • Design goal: To significantly reduce the computational load of inferences while maintaining accuracy.

Nvidia explains that this design allows the model to "do a lot with a small brain," making it particularly suitable for tasks that are repeatedly called in multi-agent systems.

Performance comparison with Nemotron 2: Up to 4 times the throughput, 60% reduction in production costs.

Compared to the previous generation Nemotron 2 Nano , NVIDIA states that the new architecture brings significant improvements:

  • The throughput of word tokens can be increased by up to 4 times.

  • The generation of inference lexical units can be reduced by up to 60%.

  • The overall cost of inference has decreased significantly.

This makes the Nemotron 3 Nano the most computationally cost-efficient open model in NVIDIA's current lineup.

Millions of lexical context windows enhance the stability of long-process tasks.

The Nemotron 3 Nano features a context window of 1 million tokens, allowing it to remember more background information in a single workflow. Nvidia points out that this design helps:

  • Connect long processes and multi-step tasks.

  • Reduce the risk of AI agents losing context during long-term operation.

  • Improve the accuracy of information retrieval and summarization tasks.

This is a crucial foundation for improving stability in enterprise-level AI assistants and automated processes.

Third-party reviews affirm: one of the most open and efficient models in its class.

An assessment by independent AI benchmarking organization Artificial Analysis indicates that the Nemotron 3 Nano is one of the "most open" models among those of similar size, and it leads in efficiency and accuracy.

Nvidia also emphasizes that openness is the core design philosophy of the Nemotron series, allowing developers to fine-tune and customize it according to their own needs.

Available now, with priority given to supporting the development and deployment ecosystem.

In terms of practical use, the Nemotron 3 Nano has already been launched:

  • Model platform: Hugging Face

  • Inference services: Baseten, Deepinfra, Fireworks, FriendliAI, OpenRouter, Together AI

  • Tool support: LM Studio, llama.cpp, SGLang, vLLM

Meanwhile, the Nemotron 3 Nano is also available as an NVIDIA NIM microservice, which can be deployed on any NVIDIA acceleration infrastructure, allowing enterprises to scale up their applications while maintaining privacy and control.

(Note: NVIDIA NIM) ™ This provides enterprises with ready-to-use AI model services. Enterprises can simply call the API to use the model without having to handle underlying performance issues themselves.

With cloud and enterprise platforms gradually in place, Nano serves as the core foundational layer for agent-based AI.

Nvidia stated that the Nemotron 3 Nano will serve as the "foundational layer model" in enterprise agent-based AI architectures:

  • AWS: Coming soon to Amazon Bedrock

  • Other platforms: Google Cloud, Coreweave, Microsoft Foundry, Nebius, Nscale, Yotta (planned)

  • Enterprise AI platforms: Couchbase, DataRobot, H2O.ai, JFrog, Lambda, UiPath

By having Nano handle a large number of basic inference tasks, companies can delegate more complex tasks to larger models within the same workflow to optimize the overall "lexicon economy".

(US chip investment expert: Google TPU has the upper hand for now, but NVIDIA GPU has a greater long-term advantage)

This article, "The New Nemotron 3 Open Source AI Model from NVIDIA, Supporting Agent-Based AI Applications Through Hardware and Software Integration," first appeared on ABMedia, a ABMedia .

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments