NVIDIA Launches Nemotron 3 Super — 120 Billion Parameter Open AI Model for Agentic Intelligence

NVIDIA (NASDAQ: NVDA) has officially released Nemotron 3 Super — the company's most capable open AI model to date and a landmark moment in the race to build the world's most efficient large-scale reasoning systems. The new Super model is a 120B total, 12B active-parameter model that delivers maximum compute efficiency and accuracy for complex multi-agent applications such as software development and cybersecurity triaging. With fully open weights, datasets, and training recipes, Nemotron 3 Super is NVIDIA's boldest challenge yet to closed-source giants like OpenAI and Google.

What Is Nemotron 3 Super? The Key Numbers

Nemotron 3 Super is a 120B total parameter model with 12B parameters active per token — a design that delivers the reasoning power of a 120B dense model at the inference cost of a 12B model. It delivers over 5x throughput than the previous Nemotron Super and tackles the "context explosion" problem with a native 1M-token context window that gives agents long-term memory for aligned, high-accuracy reasoning. The model is fully open with open weights, datasets, and recipes so developers can easily customise, optimise, and deploy it on their own infrastructure.

The training foundation is formidable: 10 trillion curated pretraining tokens, trained over 25 trillion total seen tokens, plus an additional 10 billion tokens focused on reasoning and 15 million coding problems — all aggressively deduplicated and quality-filtered to maximise signal-to-noise ratio.

The Architecture Breakthrough: Hybrid Mamba-Transformer MoE

Super builds on the same hybrid philosophy as Nano but at a fundamentally different scale. The backbone interleaves three layer types: Mamba-2 layers handle the majority of sequence processing — state space models (SSMs) provide linear-time complexity with respect to sequence length, which is what makes the 1M-token context window practical rather than theoretical. Selective Transformer attention layers handle positions where global context integration is critical. Sparse MoE routing activates only a fraction of expert networks per token, keeping active compute efficient even as total parameters scale.

This hybrid architecture means Nemotron 3 Super can fit on just two H100 GPUs while delivering the reasoning depth required for enterprise automation — a remarkable achievement for a 120B parameter model that opens up deployment to organisations that cannot afford dedicated multi-node clusters. For a deep technical walkthrough of Nemotron 3 Super's architecture, NVIDIA's own official developer blog provides the most authoritative and comprehensive reference available.

Benchmark Performance: Best Open Model in Its Class

On PinchBench — a new benchmark specifically designed to evaluate how well LLM models perform as the "brain" of an OpenClaw agent — Nemotron 3 Super scores 85.6% across the full test suite, making it the best open model in its class for agentic reasoning tasks.

On broader benchmarks, the Super model delivers up to 3.3x higher throughput than comparable models such as Qwen3-30B — while achieving leading scores on the Artificial Analysis Intelligence Index. The model's reasoning control features include a "Thinking Budget" that allows developers to explicitly set token budgets for reasoning, enabling precise trade-offs between inference speed and answer quality.

The Full Nemotron 3 Family: Nano, Super and Ultra

Nemotron 3 is a three-tier open model family designed to cover the full spectrum of enterprise AI deployment needs:

  • Nemotron 3 Nano (30B total / 3B active): Available now — optimised for targeted, highly efficient tasks including software debugging, content summarisation, AI assistant workflows, and information retrieval at low inference costs. Delivers 4x higher token throughput than Nemotron 2 Nano.
  • Nemotron 3 Super (120B total / 12B active): Released now — high-accuracy reasoning model for multi-agent applications, software development, cybersecurity triaging, and complex enterprise automation. 5x throughput over previous Super.
  • Nemotron 3 Ultra (~500B total / 50B active): Coming H1 2026 — the flagship reasoning engine for complex AI applications and deep agentic orchestration, featuring LatentMoE that allows the model to access 4x more experts at the same inference cost.

Open Innovation Stack: Not Just a Model — A Full Platform

What sets Nemotron 3 Super apart from other large model releases is NVIDIA's commitment to end-to-end openness — releasing not just model weights but the entire development and training ecosystem.

Along with the models, NVIDIA is releasing three trillion tokens of new pretraining, post-training, and reinforcement learning data, as well as the open-source libraries NeMo Gym and NeMo RL.

Model weights are available on Hugging Face in BF16, FP8, and NVFP4 precisions. Nemotron 3 Nano is already live on Amazon Bedrock and SageMaker JumpStart as of February 2026, with support across Google Cloud, Microsoft Azure, and CoreWeave rolling out. For developers, NVIDIA provides "cookbooks" and the NeMo Evaluator to validate safety and performance — making it easier than ever to transition from prototype to production-grade agentic system.

Jensen Huang's Vision: Open Innovation as AI's Foundation

"Open innovation is the foundation of AI progress," said Jensen Huang, founder and CEO of NVIDIA. "With Nemotron, we're transforming advanced AI into an open platform that gives developers the transparency and efficiency they need to build agentic systems at scale."

By committing to a Nemotron roadmap, putting its own training recipes in the open, and treating models as libraries you version and ship, NVIDIA is trying to define how serious AI software should be built. For customers deciding where to place their own multi-billion-dollar bets on AI infrastructure, that story is every bit as imperative as raw TOPS.

Key Facts at a Glance

  • Model Name: NVIDIA Nemotron 3 Super
  • Total Parameters: 120 billion
  • Active Parameters per Token: 12 billion
  • Architecture: Hybrid Mamba-Transformer Mixture-of-Experts (MoE)
  • Context Window: 1 million tokens (native)
  • Throughput Improvement: 5x over previous Nemotron Super
  • PinchBench Score: 85.6% (best open model in class)
  • GPU Deployment: Fits on 2x H100 GPUs
  • Pretraining Data: 25 trillion total tokens seen; 10T curated; 15M coding problems
  • License: NVIDIA Open Model License (open weights, datasets, recipes)
  • Availability: build.nvidia.com, Hugging Face (BF16, FP8, NVFP4)
  • Cloud Support: AWS Bedrock, SageMaker, Google Cloud, Azure, CoreWeave
  • Open Libraries Released: NeMo Gym, NeMo RL
  • Training Data Released: 3 trillion tokens (pretraining + post-training + RL)
  • Primary Use Cases: Software development, cybersecurity triaging, multi-agent enterprise automation

Conclusion

The launch of NVIDIA Nemotron 3 Super represents a decisive escalation in the open AI model race. By combining a 120B parameter scale with 12B active parameter efficiency, a 1M-token context window, 5x throughput improvement, and a fully open development stack — all deployable on just two H100 GPUs — NVIDIA has delivered a model that gives enterprises no reason to depend on closed-source alternatives for complex agentic reasoning tasks.

With Nemotron 3 Ultra still to come in H1 2026, and the NeMo Gym ecosystem enabling developers to fine-tune and specialise these models for virtually any domain, NVIDIA is no longer just the chip company that powers AI — it is becoming the open platform that defines how AI agents think, reason, and act. For developers ready to build with Nemotron 3 Super today, start at NVIDIA's official developer blog and NVIDIA's Hugging Face page for model weights, cookbooks, and deployment guides.