How to Build a Healthcare Robot from Simulation to Deployment with NVIDIA Isaac for Healthcare Oct 28 • 18
NVIDIA Releases 8 Million Sample Open Dataset and Tooling for OCR, Image Reasoning, Image and Video QA Tasks Oct 28 • 16
Llama‑Embed‑Nemotron‑8B Text Embedding Model Ranks First on Multilingual MTEB Leaderboard Oct 21 • 14
📢 NVIDIA Releases Nemotron-CC-Math Pre-Training Dataset: A High-Quality, Web-Scale Math Corpus for Pretraining Large Language Models Aug 18 • 5
NVIDIA Releases Improved Pretraining Dataset: Preserves High Value Math & Code, and Augments with Multi-Lingual Aug 18 • 3
NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks Aug 11 • 75
Llama-NeMoRetriever-ColEmbed: Developer-Focused Guide to NVIDIA's State-of-the-Art Text-Image Retrieval Jul 9 • 4
Nemotron-Personas: Improve AI Training With the First Synthetic Personas Dataset Aligned to Real-World Distributions Jun 10 • 21
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset Paper • 2504.16891 • Published Apr 23 • 25
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models Paper • 2504.03624 • Published Apr 4 • 15
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data Paper • 2410.01560 • Published Oct 2, 2024 • 4
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published Apr 21 • 67
Do Audio-Language Models Understand Linguistic Variations? Paper • 2410.16505 • Published Oct 21, 2024 • 1
FFN Fusion: Rethinking Sequential Computation in Large Language Models Paper • 2503.18908 • Published Mar 24 • 19
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control Paper • 2503.14492 • Published Mar 18 • 20
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 96
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Paper • 2411.19146 • Published Nov 28, 2024 • 17
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Paper • 2411.19146 • Published Nov 28, 2024 • 17
VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion Paper • 2302.12251 • Published Feb 23, 2023
Prismer: A Vision-Language Model with An Ensemble of Experts Paper • 2303.02506 • Published Mar 4, 2023 • 2
SSCBench: Monocular 3D Semantic Scene Completion Benchmark in Street Views Paper • 2306.09001 • Published Jun 15, 2023