Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published 16 days ago • 319
view article Article Red Teaming with RL: Exploiting Tinker API for Harmful RL on 235B Model Jan 1 • 19
view article Article Red Teaming with RL: Exploiting Tinker API for Harmful RL on 235B Model Jan 1 • 19