BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning Paper • 2603.04918 • Published 1 day ago • 41
Advancing Block Diffusion Language Models for Test-Time Scaling Paper • 2602.09555 • Published 24 days ago • 3
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model Paper • 2310.01412 • Published Oct 2, 2023 • 1
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27, 2025 • 179
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models Paper • 2602.17684 • Published about 1 month ago • 22
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper • 2602.05843 • Published 29 days ago • 59
HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing Paper • 2601.21459 • Published Jan 29 • 9
SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning Paper • 2602.02472 • Published Feb 2 • 44
Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection Paper • 2512.16905 • Published Dec 18, 2025 • 32