Pengyu Cheng
Linear95
AI & ML interests
None yet
Recent Activity
upvoted a paper about 4 hours ago
GD^2PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization upvoted a paper 8 days ago
MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination upvoted a paper 8 days ago
CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR