Renjie-Ranger/FCP_big_math_pro_C-plus_no_concise
Viewer
•
Updated
•
185k
•
10
Collections for the paper "Language Models Can Learn from Verbal Feedback Without Scalar Rewards" (https://arxiv.org/pdf/2509.22638)