-
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 159 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 313 -
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Paper • 2507.14111 • Published • 23 -
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2506.19767
-
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 138 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Paper • 2503.24235 • Published • 54 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 187
-
Large Language Diffusion Models
Paper • 2502.09992 • Published • 123 -
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper • 2502.10391 • Published • 34 -
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 18 -
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models
Paper • 2502.08130 • Published • 9
-
OpenAI o1 System Card
Paper • 2412.16720 • Published • 36 -
LearnLM: Improving Gemini for Learning
Paper • 2412.16429 • Published • 22 -
NILE: Internal Consistency Alignment in Large Language Models
Paper • 2412.16686 • Published • 8 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38
-
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 133 -
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Paper • 2506.13759 • Published • 43 -
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
Paper • 2506.18841 • Published • 56 -
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
Paper • 2506.19767 • Published • 15
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 7.7k • 1.22k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 140 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
DeepSeek-V3 Technical Report
Paper • 2412.19437 • Published • 73 -
Slamming: Training a Speech Language Model on One GPU in a Day
Paper • 2502.15814 • Published • 69 -
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
Paper • 2506.19767 • Published • 15
-
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 159 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 313 -
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Paper • 2507.14111 • Published • 23 -
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 14
-
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 133 -
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Paper • 2506.13759 • Published • 43 -
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
Paper • 2506.18841 • Published • 56 -
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
Paper • 2506.19767 • Published • 15
-
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 138 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 120 -
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Paper • 2503.24235 • Published • 54 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 187
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 7.7k • 1.22k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 140 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
Large Language Diffusion Models
Paper • 2502.09992 • Published • 123 -
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper • 2502.10391 • Published • 34 -
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 18 -
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models
Paper • 2502.08130 • Published • 9
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
OpenAI o1 System Card
Paper • 2412.16720 • Published • 36 -
LearnLM: Improving Gemini for Learning
Paper • 2412.16429 • Published • 22 -
NILE: Internal Consistency Alignment in Large Language Models
Paper • 2412.16686 • Published • 8 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38
-
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
DeepSeek-V3 Technical Report
Paper • 2412.19437 • Published • 73 -
Slamming: Training a Speech Language Model on One GPU in a Day
Paper • 2502.15814 • Published • 69 -
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
Paper • 2506.19767 • Published • 15