AI & ML interests

None defined yet.

Recent Activity

Articles

View all articles

Organization Card

image

This is NVIDIA's home for open model weights, datasets, and interactive demos. Everything here is designed to give developers and researchers production-ready starting points for Generative AI, Physical AI, and agentic workflows – backed by the same research that powers NVIDIA's enterprise AI platform.

The Nemotron Family: Digital Intelligence

The Nemotron family is NVIDIA's lineup of purpose-built foundation models spanning language, reasoning, vision, retrieval, speech, and safety. Each model targets a specific performance profile - from ultra-efficient edge inference to heavyweight multi-turn agent orchestration - and ships with open weights, open datasets, and reproducible training recipes.

Language & Reasoning (Nemotron 3)

The core language model lineup, engineered for advanced reasoning and agentic tasks across a range of model sizes and deployment targets.

  • Nemotron 3 Nano: A highly efficient small language model (SLM) built on a hybrid Mamba-2 + Transformer MoE architecture (30B total / 3B active parameters), optimized for on-device agentic tasks. Features a 1M-token context window, reasoning ON/OFF modes with configurable thinking budgets, and up to 4× faster inference than its predecessor. Served via vLLM and SGLang.
  • Nemotron 3 Super: This model features 120 billion total parameters with 12 billion active parameters per forward pass. It is built on a hybrid Mamba-Transformer Mixture-of-Experts (MoE) architecture. To maximize compute efficiency and accuracy, the model incorporates LatentMoE, Multi-Token Prediction (MTP) layers, and native NVFP4 pretraining. Designed for complex multi-agent applications, it maintains a 1-million-token context window and delivers up to 5x higher throughput than the previous Nemotron Super.
  • Nemotron 3 Ultra: In Development. Larger parameter variant designed for advanced reasoning, coding, and multi-turn agent orchestration.

Speech & Multimodal

NVIDIA provides a broad set of specialized multimodal foundations that integrate seamlessly with the Nemotron ecosystem — spanning speech recognition, multilingual translation, vision-language understanding, and real-time voice AI. These models are optimized for both cloud and edge deployment.

Speech Recognition & Translation (Nemotron Speech)

State-of-the-art, production-ready speech models from the NVIDIA NeMo Speech research team for ASR, TTS, speaker diarization, and speech-to-speech.

  • Parakeet: A family of FastConformer-based ASR (Automatic Speech Recognition) models achieving state-of-the-art WER (Word Error Rate). The latest parakeet-tdt-0.6b-v3 extends support to 25 European languages with automatic language detection, while the 1.1B English variant delivers maximum transcription accuracy.
  • Canary: Multilingual and multitask speech models capable of simultaneous translation and transcription across 25 languages, trained on NVIDIA's Granary dataset.
  • Nemotron Speech Streaming: A cache-aware streaming ASR model with native punctuation and capitalization, offering configurable chunk sizes (80ms–1120ms) for low-latency voice agent pipelines.
  • Parakeet Realtime EOU: A lightweight 120M-parameter streaming ASR model with built-in end-of-utterance detection at 80–160ms latency, purpose-built for voice AI agent turn-taking.
  • Multitalker Parakeet: Streaming multi-speaker ASR using speaker kernel injection - handles fully overlapped speech without requiring speaker enrollment audio.

Vision & Document Intelligence

Vision-language models that bring multimodal understanding to documents, images, and video - from OCR and chart parsing to visual Q&A.

  • Nemotron Nano VL (12B): A vision-language model for multi-image reasoning, video understanding, and document intelligence. Built on the Nemotron Nano 2 backbone with a CRadioV2-H vision encoder. Trained on ~39.5M samples across 270 datasets. Available in BF16, FP8, and NVFP4-QAD.
  • Llama Nemotron Nano VL (8B): An 8B VLM specialized in document intelligence and OCR, with best-in-class DocVQA, ChartQA, and AI2D benchmarks. Deployable on edge devices including Jetson Orin via AWQ 4-bit quantization.

Nemotron RAG

A complete, modular retrieval-augmented generation stack — from document ingestion through semantic search to precision reranking — designed for production pipelines that handle text, images, and complex multimodal documents.

Community Collaborations

NVIDIA releases optimized and aligned versions of leading community architectures, leveraging proprietary alignment techniques (SteerLM, RLHF, RPO) and open datasets like HelpSteer2 to push open models further.

  • Llama-3.1-Nemotron: A diverse family of models where Llama 3.1 architectures are fine-tuned using NVIDIA's HelpSteer2 datasets to improve helpfulness and instruction adherence. Includes Ultra (253B), Super (49B), and Nano (8B) variants.
  • Mistral-NeMo: A 12B parameter model built in collaboration with Mistral AI, offering a high performance-to-size ratio and an expanded 128k context window.

Physical AI

NVIDIA Cosmos

NVIDIA Cosmos is a platform of generative World Foundation Models (WFMs), tokenizers, and data curation tools — purpose-built to model and simulate physical interactions for robotics and autonomous systems.

  • Cosmos Tokenizer: A suite of high-compression visual tokenizers for images and videos, achieving up to 2048× total compression at up to 12× faster than prior SOTA. Available in Continuous and Discrete variants, enabling efficient encoding/decoding for both diffusion and autoregressive modeling.
  • Cosmos Predict 2.5: Diffusion-transformer based models for generating high-fidelity, physics-aware images and videos from text, image, or video inputs. Available in 2B and 14B variants.

NVIDIA GR00T

GR00T N1.5 VLA: NVIDIA's open foundation model for humanoid robot reasoning and control. Combines an Eagle-based vision-language backbone with a diffusion transformer (DiT) action head for language-conditioned manipulation across diverse embodiments. We have integrated GR00T N1.5 into LeRobot for policy post-training learning and deployment.

IsaacLab-Arena

IsaacLab-Arena: An open-source framework for large-scale, GPU-accelerated robot policy evaluation in simulation built over top of IsaacLab. Provides modular APIs for task curation, automated diversification, and parallel benchmarking across embodiments and environments. We have integrated IsaacLab-Arena into LeRobot for scalable closed-loop policy evaluation and benchmarking along with datasets and 250+ scenes from our partner Lightwheel AI, on HuggingfaceHub.

Nemotron Datasets

Every model NVIDIA ships rests on a data layer — and that data shapes how the model reasons, what it knows, and where it can be safely deployed. Nemotron Datasets are the open version of that foundation: web-scale pretraining corpora, alignment and reasoning data, multimodal grounding, and embodied AI simulation, released under permissive licenses with the training recipes and evaluation frameworks that produced them. Beyond Nemotron, NVIDIA's broader open data catalog spans 200+ releases across Physical AI and robotics, autonomous vehicles, biology and drug discovery, retrieval and evaluation benchmarks, and sovereign AI. Use the table below to find the right starting point for what you're trying to build.

Which dataset collection should I use?

If you want to... Use this Collection Start with these datasets
FOUNDATION
Pre-train a base model Nemotron Pre-Training Collection Nemotron-CC-v2.1, Nemotron-CC-Math-v1, Nemotron-CC-Code-v1, Nemotron-ClimbMix
BUILD A CAPABILITY
Math reasoning, proofs, and quantitative problem-solving Nemotron Math & Reasoning Collection Nemotron-SFT-Math-v3, Nemotron-Math-v2, AceReason-Math, Nemotron-CC-Math-v1
Code generation, debugging, and SWE workflows Nemotron Code & SWE Collection Nemotron-SFT-Competitive-Programming-v2, Nemotron-SFT-SWE-v2, Nemotron-CC-Code-v1
Helpful, multi-turn, instruction-following chat Nemotron Chat & Instruction Collection Nemotron-SFT-Instruction-Following-Chat-v2, Nemotron-RL-instruction_following
Agentic and tool-use behavior Nemotron Agentic Collection Nemotron-SFT-Agentic-v2, Nemotron-RL-Agentic-Function-Calling-Pivot-v1
Safety, refusals, and content moderation Nemotron Safety Collection Aegis-AI-Content-Safety-Dataset-2.0, Nemotron-Safety-Guard-Dataset-v3, Nemotron-PII
Image and document understanding Nemotron Vision-Language Collection Nemotron-VLM-Dataset-v2, Llama-Nemotron-VLM-Dataset-v1
TRAINING STAGES
RL data with verifiable rewards (math, code, agentic, instruction) Nemotron Reinforcement Learning Collection Nemotron-RL-math-OpenMathReasoning, Nemotron-RL-coding-competitive_coding, Nemotron-RL-Agentic-Function-Calling-Pivot-v1
Train a reward model Nemotron Reward Modeling Collection HelpSteer3, HelpSteer2, Nemotron-RLHF-GenRM-v1
Full post-training recipe (SFT + RL blend) Nemotron Post-Training Blends Collection Llama-Nemotron-Post-Training-Dataset, Nemotron-Post-Training-Dataset-v2, Nemotron-Cascade-2-SFT-Data
Evaluate model performance Nemotron Eval & Benchmark Collection SPEED-bench
SPECIALIZED & SOVEREIGN
Multilingual or domain-specific (e.g. finance) capability Nemotron Supervised Fine-Tuning Collection Nemotron-SFT-Multilingual-v1, Nemotron-SpecializedDomains-Finance-v1
Diverse synthetic personas grounded in real population distributions Nemotron Personas Collection Nemotron-Personas-USA / India / Japan / Brazil / France / Singapore