Self-Improving Language Models with Bidirectional Evolutionary Search Paper • 2605.28814 • Published 11 days ago • 59
OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning Paper • 2605.28691 • Published 11 days ago • 24
HumanNet: Scaling Human-centric Video Learning to One Million Hours Paper • 2605.06747 • Published about 1 month ago • 52
Look-Back: Implicit Visual Re-focusing in MLLM Reasoning Paper • 2507.03019 • Published Jul 2, 2025 • 1
CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step Paper • 2507.04451 • Published Jul 6, 2025
Reinforcement Learning with Inverse Rewards for World Model Post-training Paper • 2509.23958 • Published Sep 28, 2025
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 352
Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs Paper • 2603.22446 • Published Mar 23 • 10
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 352 • 7
Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells Paper • 2603.25240 • Published Mar 26 • 77
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 352
Enhancing Spatial Understanding in Image Generation via Reward Modeling Paper • 2602.24233 • Published Feb 27 • 60