FAPO

dyyyyyyyy 's Collections

FAPO

SCAN

GNER

COLDQA

updated Oct 24

FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning. Project Page: https://fapo-rl.github.io/

dyyyyyyyy/FAPO-GenRM-4B

Text Generation • 4B • Updated Oct 31 • 100 • 1

Note 4B Generative Reward Model for FAPO Reinforcement Learning.
dyyyyyyyy/FAPO-32B

33B • Updated Oct 28 • 14 • 1

Note 32B FAPO Reasoning Model Trained with Generative Reward.
dyyyyyyyy/FAPO-Critic

Viewer • Updated Oct 31 • 87k • 76

Note Training and Evaluation Dataset for FAPO-GenRM-4B (Generative Reward Model).
dyyyyyyyy/FAPO-Reasoning-Dataset

Viewer • Updated Oct 28 • 351k • 106

Note Training and Evaluation Dataset for FAPO-32B (Reasoning Model).