Nvidia Nemotron 3 Super: $26B Open-Weight AI Bet

Nvidia Nemotron 3 Super is a 120B open-weight model for multi-agent AI, released amid a $26 billion five-year open-source investment plan, as of March 2026.

What to Know

Nvidia Nemotron 3 Super is a 120-billion-parameter open-weight model running only 12B active parameters via a mixture-of-experts design
The model ships with a 1-million-token context window and was pretrained natively in NVFP4 4-bit floating-point format from day one
Nvidia's $26 billion five-year open-weight AI investment is confirmed by a 2025 financial filing, with a 550B-parameter model already pretrained
Chinese open models climbed from 1.2% of global open-model usage in late 2024 to roughly 30% by end of 2025 — Qwen overtook Llama as the top self-hosted model

Nvidia Nemotron 3 Super landed this week — a 120-billion-parameter open-weight model built specifically for multi-agent AI workflows, and the clearest signal yet that Nvidia isn't content staying a chip company. The release sits inside a $26 billion, five-year commitment to open-weight models that most people missed. They shouldn't have.

What Is Nvidia Nemotron 3 Super?

A hybrid architecture built for agents, not chatbots

Nvidia Nemotron 3 Super is a 120-billion-parameter model that activates only 12 billion parameters at inference time, using a mixture-of-experts (MoE) architecture to keep compute costs low without sacrificing reasoning depth. It ships with a 1-million-token context window — enough to hold roughly 750,000 words, or an entire large codebase, in memory without context collapse. Per Nvidia's developer blog, the model targets multi-agent workflows, where token costs compound fast: every tool call, every reasoning step, every slice of retrieved context gets re-sent from scratch, pushing inference costs exponentially higher than in a simple chat session.

The architecture combines three components that rarely appear together: Mamba-2 state-space layers for fast, memory-efficient long-token processing; standard Transformer attention layers for precise recall; and a new 'Latent MoE' design that compresses token embeddings before expert routing. That last piece allows the model to activate four times as many specialists at the same compute cost.

Training methodology sets this apart. Nemotron 3 Super was pretrained natively in NVFP4 — Nvidia's own 4-bit floating-point format — learning to operate within 4-bit arithmetic from the first gradient update, rather than being quantized after training. Quantizing after training often degrades reasoning quality; doing it natively doesn't. The result: more than five times the throughput of its predecessor, 2.2x faster inference than OpenAI's GPT-OSS 120B, and 7.5x faster than Alibaba's Qwen3.5-122B.

The $26 Billion Commitment Behind One Model

Nemotron 3 Super isn't a standalone product launch — it's a data point in a much larger strategy. A 2025 financial filing shows Nvidia is committing $26 billion over five years to open-weight AI models. Bryan Catanzaro, Nvidia's VP of applied deep learning research, confirmed the scope: the company recently wrapped pretraining on a 550-billion-parameter model. Nemotron isn't the ceiling. It's the floor.

The full training pipeline is public: weights on Hugging Face, 10 trillion curated pretraining tokens from 25 trillion total seen during training, 40 million post-training samples, and reinforcement learning recipes spanning 21 environment configurations. Perplexity, Palantir, Cadence, and Siemens are already integrating the model. Nvidia first shipped a Nemotron model in November 2023 — that filing makes clear this is no longer a side project.

The strategic logic isn't subtle. Nvidia's GPUs are the default infrastructure for training and running frontier models. If developers build pipelines on Nemotron — optimized for Nvidia hardware from the ground up — that's a retention play dressed in open-source packaging. Call it open-washing if you like, but the weights are public, the datasets are public, the recipes are public. That's more than Meta has done lately.

Is America Losing the Open-Source AI Race?

Chinese models went from 1.2% to 30% of global usage in one year

Here's the number that should focus attention: Chinese open models went from roughly 1.2% of global open-model usage in late 2024 to approximately 30% by end of 2025, according to research from OpenRouter and Andreessen Horowitz. Alibaba's Qwen3.5 series overtook Meta's Llama as the most-used self-hosted open-source model globally. American companies — including Airbnb — adopted it for customer service. Startups worldwide are building on top of it. That level of adoption doesn't reverse quickly.

A Brookings Institution report published Monday frames the divergence clearly. The U.S. is running an AGI race while China is running an adoption race — prioritizing efficiency, global reach, and embedding AI into real-world systems. DeepSeek, Alibaba, and others have been flooding the open ecosystem with their best models while OpenAI, Anthropic, and Google keep theirs gated behind APIs. Meta was the one major American counterweight in open source. Then Zuckerberg signaled the company might not make future models fully open.

The gap between best proprietary model and best open model used to be wide, and it used to favor America. That gap has nearly closed, and the open side of the ledger is increasingly Chinese.

There's a hardware dimension underneath all of this. A new DeepSeek model is widely expected soon, rumored to have been trained on chips made by Huawei — a sanctioned Chinese company. If confirmed, that gives developers worldwide a concrete reason to test Huawei's hardware stack. China's Ziphu AI is already doing it. The scenario Nvidia most needs to prevent: Chinese open models and Chinese chips forming an ecosystem that doesn't need Nvidia's GPUs at all.

The U.S. is obsessed with the race to AGI or artificial general intelligence. American tech companies are pouring hundreds of billions into that goal.

— Brookings Institution report, March 2026

What Does This Mean for AI Developers?

For engineers running multi-agent systems, the efficiency math is compelling. A model activating 12 billion parameters while drawing on 120 billion total means near-frontier reasoning at a fraction of frontier cost. The 1-million-token context window means agents can hold state across long tasks without constant resets — one of the most painful failure modes in production pipelines.

Nvidia's internal benchmarks showed the model catching errors in context without being prompted to, handling math and logic cleanly, and holding up on prompts that were deliberately vague or factually wrong. Robustness under bad inputs matters in real agentic deployments. Perplexity and Palantir will provide the real-world stress test soon enough.

Nemotron 3 Super answers a direct question about where Nvidia is heading: not just chips, not just hardware, but the full stack. Models, training recipes, deployment tooling, and now the narrative around open-source AI leadership. Whether that's enough to slow China's momentum in open-weight models — that's a different question entirely.

Frequently Asked Questions

What is Nvidia Nemotron 3 Super?

Nvidia Nemotron 3 Super is a 120-billion-parameter open-weight AI model that activates only 12 billion parameters at inference time using a mixture-of-experts design. It features a 1-million-token context window and was built for multi-agent AI workflows, offering 2.2x faster inference than OpenAI's GPT-OSS 120B and 7.5x faster than Alibaba's Qwen3.5-122B.

How much is Nvidia investing in open-weight AI models?

Nvidia's 2025 financial filing confirms a planned $26 billion investment over five years in open-weight AI models. VP Bryan Catanzaro confirmed the company recently finished pretraining a 550-billion-parameter model, indicating Nemotron 3 Super is an early milestone in a large-scale, long-term open-source AI strategy tied directly to Nvidia hardware adoption.

How does Nemotron 3 Super compare to Qwen and GPT open models?

Nemotron 3 Super delivers 2.2x faster inference throughput than OpenAI's GPT-OSS 120B and 7.5x faster than Alibaba's Qwen3.5-122B. It also surpasses its own predecessor by more than 5x on throughput, while maintaining strong accuracy on reasoning, math, and error-detection tasks including vague or malformed input prompts.

Why is Nvidia releasing open-weight AI models?

Nvidia's open-weight strategy is a hardware retention play: models optimized for Nvidia's NVFP4 format and GPU stack incentivize developers to stay on Nvidia infrastructure. It also counters the rise of Chinese open models like Alibaba's Qwen, which grew from 1.2% to 30% of global open-model usage between late 2024 and end of 2025, threatening Nvidia's ecosystem dominance.