Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL Paper • 2602.03773 • Published 25 days ago • 11
Running Featured 55 QED-Nano: Teaching a Tiny Model to Prove Hard Theorems 📝 55 Who needs 1T parameters? Olympiad proofs with a 4B model
HerrHruby/answerbench_offline_acemath_rl_4b_inst_hard_with_dishsoap_16k_self_refine_step_70 Viewer • Updated Jan 25 • 3.2k • 10
HerrHruby/answerbench_offline_acemath_rl_4b_inst_hard_with_dishsoap_16k_self_refine_step_70 Viewer • Updated Jan 25 • 3.2k • 10
HerrHruby/answerbench_offline_acemath_rl_4b_hard_with_dishsoap_16k_self_verify_step_80 Viewer • Updated Jan 25 • 1.6k • 9
HerrHruby/answerbench_offline_acemath_rl_4b_hard_with_dishsoap_16k_self_verify_step_80 Viewer • Updated Jan 25 • 1.6k • 9
HerrHruby/aime_offline_acemath_rl_4b_inst_hard_with_dishsoap_16k_self_refine_step_70 Viewer • Updated Jan 25 • 240 • 10
HerrHruby/aime_offline_acemath_rl_4b_inst_hard_with_dishsoap_16k_self_refine_step_70 Viewer • Updated Jan 25 • 240 • 10
HerrHruby/aime_offline_acemath_rl_4b_hard_with_dishsoap_16k_self_verify_step_80 Viewer • Updated Jan 25 • 240 • 10
HerrHruby/aime_offline_acemath_rl_4b_hard_with_dishsoap_16k_self_verify_step_80 Viewer • Updated Jan 25 • 240 • 10