MusiCRS: Benchmarking Audio-Centric Conversational Recommendation Paper • 2509.19469 • Published Sep 23, 2025
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning Paper • 2605.02913 • Published Apr 8 • 9
F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking Paper • 2605.12995 • Published 8 days ago • 2
MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization Paper • 2605.10784 • Published 10 days ago • 1
F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking Paper • 2605.12995 • Published 8 days ago • 2
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning Paper • 2605.02913 • Published Apr 8 • 9