On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7, 2025 • 181
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 Image-to-Text • 402B • Updated May 22, 2025 • 45.6k • 150