dyyyyyyyy/FAPO-GenRM-4B
Text Generation
•
4B
•
Updated
•
100
•
1
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning. Project Page: https://fapo-rl.github.io/
Note 4B Generative Reward Model for FAPO Reinforcement Learning.
Note 32B FAPO Reasoning Model Trained with Generative Reward.
Note Training and Evaluation Dataset for FAPO-GenRM-4B (Generative Reward Model).
Note Training and Evaluation Dataset for FAPO-32B (Reasoning Model).