AI & ML interests
None defined yet.
Recent Activity
Papers
Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding
Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips
Articles
This is NVIDIA's home for open model weights, datasets, and interactive demos. Everything here is designed to give developers and researchers production-ready starting points for Generative AI, Physical AI, and agentic workflows – backed by the same research that powers NVIDIA's enterprise AI platform.
The Nemotron Family: Digital Intelligence
The Nemotron family is NVIDIA's lineup of purpose-built foundation models spanning language, reasoning, vision, retrieval, speech, and safety. Each model targets a specific performance profile - from ultra-efficient edge inference to heavyweight multi-turn agent orchestration - and ships with open weights, open datasets, and reproducible training recipes.
Language & Reasoning (Nemotron 3)
The core language model lineup, engineered for advanced reasoning and agentic tasks across a range of model sizes and deployment targets.
- Nemotron 3 Nano: A highly efficient small language model (SLM) built on a hybrid Mamba-2 + Transformer MoE architecture (30B total / 3B active parameters), optimized for on-device agentic tasks. Features a 1M-token context window, reasoning ON/OFF modes with configurable thinking budgets, and up to 4× faster inference than its predecessor. Served via vLLM and SGLang.
- Nemotron 3 Super: This model features 120 billion total parameters with 12 billion active parameters per forward pass. It is built on a hybrid Mamba-Transformer Mixture-of-Experts (MoE) architecture. To maximize compute efficiency and accuracy, the model incorporates LatentMoE, Multi-Token Prediction (MTP) layers, and native NVFP4 pretraining. Designed for complex multi-agent applications, it maintains a 1-million-token context window and delivers up to 5x higher throughput than the previous Nemotron Super.
- Nemotron 3 Ultra: In Development. Larger parameter variant designed for advanced reasoning, coding, and multi-turn agent orchestration.
Speech & Multimodal
NVIDIA provides a broad set of specialized multimodal foundations that integrate seamlessly with the Nemotron ecosystem — spanning speech recognition, multilingual translation, vision-language understanding, and real-time voice AI. These models are optimized for both cloud and edge deployment.
Speech Recognition & Translation (Nemotron Speech)
State-of-the-art, production-ready speech models from the NVIDIA NeMo Speech research team for ASR, TTS, speaker diarization, and speech-to-speech.
- Parakeet: A family of FastConformer-based ASR (Automatic Speech Recognition) models achieving state-of-the-art WER (Word Error Rate). The latest parakeet-tdt-0.6b-v3 extends support to 25 European languages with automatic language detection, while the 1.1B English variant delivers maximum transcription accuracy.
- Canary: Multilingual and multitask speech models capable of simultaneous translation and transcription across 25 languages, trained on NVIDIA's Granary dataset.
- Nemotron Speech Streaming: A cache-aware streaming ASR model with native punctuation and capitalization, offering configurable chunk sizes (80ms–1120ms) for low-latency voice agent pipelines.
- Parakeet Realtime EOU: A lightweight 120M-parameter streaming ASR model with built-in end-of-utterance detection at 80–160ms latency, purpose-built for voice AI agent turn-taking.
- Multitalker Parakeet: Streaming multi-speaker ASR using speaker kernel injection - handles fully overlapped speech without requiring speaker enrollment audio.
Vision & Document Intelligence
Vision-language models that bring multimodal understanding to documents, images, and video - from OCR and chart parsing to visual Q&A.
- Nemotron Nano VL (12B): A vision-language model for multi-image reasoning, video understanding, and document intelligence. Built on the Nemotron Nano 2 backbone with a CRadioV2-H vision encoder. Trained on ~39.5M samples across 270 datasets. Available in BF16, FP8, and NVFP4-QAD.
- Llama Nemotron Nano VL (8B): An 8B VLM specialized in document intelligence and OCR, with best-in-class DocVQA, ChartQA, and AI2D benchmarks. Deployable on edge devices including Jetson Orin via AWQ 4-bit quantization.
Nemotron RAG
A complete, modular retrieval-augmented generation stack — from document ingestion through semantic search to precision reranking — designed for production pipelines that handle text, images, and complex multimodal documents.
- Extract: Models for ingesting and extracting structured information from multimodal sources, including charts, tables, and scanned documents (e.g. Nemotron Parse v1.1, Nemotron OCR and Object Detection).
- Embed: Multimodal embedding models that map text, images, or audio into shared semantic vectors spaces for search and retrieval (e.g. Llama Nemotron Embed VL 1B v2).
- Rerank: Cross-encoder reranking models that rescore retrieved candidates using deeper relevance modeling (e.g. Llama Nemotron Rerank VL 1B v2).
Community Collaborations
NVIDIA releases optimized and aligned versions of leading community architectures, leveraging proprietary alignment techniques (SteerLM, RLHF, RPO) and open datasets like HelpSteer2 to push open models further.
- Llama-3.1-Nemotron: A diverse family of models where Llama 3.1 architectures are fine-tuned using NVIDIA's HelpSteer2 datasets to improve helpfulness and instruction adherence. Includes Ultra (253B), Super (49B), and Nano (8B) variants.
- Mistral-NeMo: A 12B parameter model built in collaboration with Mistral AI, offering a high performance-to-size ratio and an expanded 128k context window.
Physical AI
NVIDIA Cosmos
NVIDIA Cosmos is a platform of generative World Foundation Models (WFMs), tokenizers, and data curation tools — purpose-built to model and simulate physical interactions for robotics and autonomous systems.
- Cosmos Tokenizer: A suite of high-compression visual tokenizers for images and videos, achieving up to 2048× total compression at up to 12× faster than prior SOTA. Available in Continuous and Discrete variants, enabling efficient encoding/decoding for both diffusion and autoregressive modeling.
- Cosmos Predict 2.5: Diffusion-transformer based models for generating high-fidelity, physics-aware images and videos from text, image, or video inputs. Available in 2B and 14B variants.
NVIDIA GR00T
GR00T N1.5 VLA: NVIDIA's open foundation model for humanoid robot reasoning and control. Combines an Eagle-based vision-language backbone with a diffusion transformer (DiT) action head for language-conditioned manipulation across diverse embodiments. We have integrated GR00T N1.5 into LeRobot for policy post-training learning and deployment.
IsaacLab-Arena
IsaacLab-Arena: An open-source framework for large-scale, GPU-accelerated robot policy evaluation in simulation built over top of IsaacLab. Provides modular APIs for task curation, automated diversification, and parallel benchmarking across embodiments and environments. We have integrated IsaacLab-Arena into LeRobot for scalable closed-loop policy evaluation and benchmarking along with datasets and 250+ scenes from our partner Lightwheel AI, on HuggingfaceHub.
Nemotron Datasets
Every model NVIDIA ships rests on a data layer — and that data shapes how the model reasons, what it knows, and where it can be safely deployed. Nemotron Datasets are the open version of that foundation: web-scale pretraining corpora, alignment and reasoning data, multimodal grounding, and embodied AI simulation, released under permissive licenses with the training recipes and evaluation frameworks that produced them. Beyond Nemotron, NVIDIA's broader open data catalog spans 200+ releases across Physical AI and robotics, autonomous vehicles, biology and drug discovery, retrieval and evaluation benchmarks, and sovereign AI. Use the table below to find the right starting point for what you're trying to build.
Which dataset collection should I use?
| If you want to... | Use this Collection | Start with these datasets |
|---|---|---|
| FOUNDATION | ||
| Pre-train a base model | Nemotron Pre-Training Collection | Nemotron-CC-v2.1, Nemotron-CC-Math-v1, Nemotron-CC-Code-v1, Nemotron-ClimbMix |
| BUILD A CAPABILITY | ||
| Math reasoning, proofs, and quantitative problem-solving | Nemotron Math & Reasoning Collection | Nemotron-SFT-Math-v3, Nemotron-Math-v2, AceReason-Math, Nemotron-CC-Math-v1 |
| Code generation, debugging, and SWE workflows | Nemotron Code & SWE Collection | Nemotron-SFT-Competitive-Programming-v2, Nemotron-SFT-SWE-v2, Nemotron-CC-Code-v1 |
| Helpful, multi-turn, instruction-following chat | Nemotron Chat & Instruction Collection | Nemotron-SFT-Instruction-Following-Chat-v2, Nemotron-RL-instruction_following |
| Agentic and tool-use behavior | Nemotron Agentic Collection | Nemotron-SFT-Agentic-v2, Nemotron-RL-Agentic-Function-Calling-Pivot-v1 |
| Safety, refusals, and content moderation | Nemotron Safety Collection | Aegis-AI-Content-Safety-Dataset-2.0, Nemotron-Safety-Guard-Dataset-v3, Nemotron-PII |
| Image and document understanding | Nemotron Vision-Language Collection | Nemotron-VLM-Dataset-v2, Llama-Nemotron-VLM-Dataset-v1 |
| TRAINING STAGES | ||
| RL data with verifiable rewards (math, code, agentic, instruction) | Nemotron Reinforcement Learning Collection | Nemotron-RL-math-OpenMathReasoning, Nemotron-RL-coding-competitive_coding, Nemotron-RL-Agentic-Function-Calling-Pivot-v1 |
| Train a reward model | Nemotron Reward Modeling Collection | HelpSteer3, HelpSteer2, Nemotron-RLHF-GenRM-v1 |
| Full post-training recipe (SFT + RL blend) | Nemotron Post-Training Blends Collection | Llama-Nemotron-Post-Training-Dataset, Nemotron-Post-Training-Dataset-v2, Nemotron-Cascade-2-SFT-Data |
| Evaluate model performance | Nemotron Eval & Benchmark Collection | SPEED-bench |
| SPECIALIZED & SOVEREIGN | ||
| Multilingual or domain-specific (e.g. finance) capability | Nemotron Supervised Fine-Tuning Collection | Nemotron-SFT-Multilingual-v1, Nemotron-SpecializedDomains-Finance-v1 |
| Diverse synthetic personas grounded in real population distributions | Nemotron Personas Collection | Nemotron-Personas-USA / India / Japan / Brazil / France / Singapore |
spaces 52
Music Flamingo
Analyze music and answer questions from audio or YouTube links
VoMP
Volumetric physics materials for interactive worlds
LLM RTL Coding Errors Explainer
NVR - How LLMs Fail and Generalize in RTL Coding
Kimodo
Generate high-quality motions from text prompts
KVPress Leaderboard
KVPress leaderboard: benchmark KV Cache compression methods
Audio Flamingo 3 Demo
Audio Flamingo 3 Demo
