NVIDIA unveils Nemotron 3 Nano Omni model, enhancing AI agents’ efficiency by 9x

28 Apr 2026 21:29

1057059
World
Share

NVIDIA unveils Nemotron 3 Nano Omni model, enhancing AI agents’ efficiency by 9x

Photo credit: blogs.nvidia.com

Unveiled today, the NVIDIA Nemotron 3 Nano Omni is an open multimodal model that integrates various capabilities into one system, allowing agents to provide faster, smarter responses with advanced reasoning across video, audio, image, and text.

This best-in-class model gives enterprises and developers a production path for more efficient and accurate multimodal AI agents with full deployment flexibility and control, News.Az reports, citing foreign media.

RECOMMENDED STORIES

Nemotron 3 Nano Omni sets a new efficiency frontier for open multimodal models with leading accuracy and low cost, topping six leaderboards for complex document intelligence, and video and audio understanding.

AI and software companies already adopting Nemotron 3 Nano Omni include Aible, Applied Scientific Intelligence (ASI), Eka Care, Foxconn, H Company, Palantir and Pyler, with Dell Technologies, DocuSign, Infosys, K-Dense, Lila, Oracle and Zefr evaluating the model.

“To build useful agents, you can’t wait seconds for a model to interpret a screen,” said Gautier Cloix, CEO of H Company. “By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn’t practical before. This isn’t just a speed boost: It’s a fundamental shift in how our agents perceive and interact with digital environments in real time.”

Nemotron 3 Nano Omni Enables Faster, Leaner Multimodal Agents
Consider an AI agent for customer support processing a screen recording while analyzing uploaded call audio and checking data logs — or an agent for finance tasked with parsing PDFs, spreadsheets, charts and voice notes. Today, most agentic systems accomplish these tasks with separate models for vision, speech and language.

This approach increases latency through repeated inference passes, fragments context across modalities, and adds cost and inaccuracies over time.

By combining vision and audio encoders within its 30B-A3B hybrid mixture-of-experts architecture, Nemotron 3 Nano Omni eliminates the need for separate perception models, improving efficiency at scale. As the first open model to deliver both this level of efficiency and strong multimodal perception accuracy, it enables AI systems to achieve up to 9x higher throughput than other open omni models with similar interactivity. The result is lower cost and better scalability — without sacrificing responsiveness or quality.

By combining vision and audio encoders within its 30B-A3B, hybrid mixture-of-experts architecture, Nemotron 3 Nano Omni eliminates the need for separate perception models, driving inference efficiency at scale. It pairs this efficiency with strong multimodal perception accuracy, enabling AI systems to achieve 9x higher throughput than other open omni models with the same interactivity. The result is lower costs and better scalability without sacrificing responsiveness or quality.

In agentic systems, Nemotron 3 Nano Omni can work alongside proprietary cloud models or other NVIDIA Nemotron open models — such as Nemotron 3 Super for high-frequency execution or Nemotron 3 Ultra for complex planning — as well as proprietary models from other providers, to power sub-agents for agentic workflows such as computer use, document intelligence and audio-video reasoning.

News.Az

By Ulviyya Salmanli