🔴 🤖 Models Wednesday, April 29, 2026 · 2 min read

NVIDIA Nemotron 3 Nano Omni: open multimodal model 30B-A3B MoE with 256K context, 9× higher throughput than competitors

Editorial illustration: a multimodal AI system unifying video, audio, and text through a hybrid mixture-of-experts architecture

Why it matters

Nemotron 3 Nano Omni is NVIDIA's new open multimodal model that unifies vision, speech, and language in a single 30B-A3B hybrid mixture-of-experts system with 256K context. It achieves top accuracy on six leaderboards for document intelligence and audio-video understanding, with 9× higher throughput than other open omni models at the same interactivity level. Available immediately on HuggingFace, OpenRouter, NVIDIA NIM, and 25+ partner platforms; Foxconn, Palantir, and six other companies are already using the model in production.

On April 28, 2026, NVIDIA introduced Nemotron 3 Nano Omni — an open multimodal model that combines vision, speech, and language in a single system. The model is positioned as a “perception sub-agent” that pairs with the larger Nemotron 3 Super and Ultra models: Nano handles real-time video and audio understanding, while Super/Ultra take over more complex reasoning. With this, NVIDIA addresses a concrete problem of production AI agents — latency in multimodal pipelines where input is routed through separate ASR, vision encoder, and text LLM components.

What’s in the architecture?

30B-A3B hybrid mixture-of-experts — 30 billion parameters total, 3 billion active per inference. 256K token context. Specific components: Conv3D (3D convolution for video) and EVS (Enhanced Visual System). Input modalities: text, images, audio, video, documents, charts, and interfaces (GUI screenshots). Output: text.

What numbers is NVIDIA putting on the table?

The model leads six leaderboards for complex document intelligence and video and audio understanding. The headline figure: 9× higher throughput than other open omni models at the same interactivity level (latency budget). NVIDIA argues this directly reduces the cost of production agents, since fewer GPU hours are required per unit of work.

Who is already using it?

NVIDIA has announced concrete enterprise clients that have moved from evaluation to production: Aible, Applied Scientific Intelligence (ASI), Eka Care, Foxconn, H Company, Palantir, and Pyler. Use cases: customer support, document analysis, and computer interface navigation (GUI agents). Additional companies are evaluating the model: Dell Technologies, DocuSign, Infosys, K-Dense, Lila, Oracle, and Zefr.

Where is it available?

HuggingFace, OpenRouter, NVIDIA NIM (build.nvidia.com as a microservice), and 25+ partner platforms — including day-zero availability on Amazon SageMaker JumpStart. NVIDIA’s distribution move is aggressive: the model is simultaneously open weights (HF), an inference API (OpenRouter), NVIDIA’s own service (NIM), and a hyperscaler partnership (AWS).

🤖

This article was generated using artificial intelligence from primary sources.