Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Renjie-Ranger 's Collections
Feedback_Conditional_Policy
Long_CoT_Degradation_RL
Long_CoT_Degradation_SFT

Feedback_Conditional_Policy

updated 5 days ago

Collections for the paper "Language Models Can Learn from Verbal Feedback Without Scalar Rewards" (https://arxiv.org/pdf/2509.22638)

Upvote
1

  • Renjie-Ranger/FCP_big_math_pro_C-plus_no_concise

    Viewer • Updated Sep 25, 2025 • 185k • 10

  • Renjie-Ranger/FCP_general_reasoner_pro_C-plus_no_concise

    Viewer • Updated Sep 25, 2025 • 133k • 8

  • Renjie-Ranger/FCP_general_reasoner_pro_SFT

    Viewer • Updated Sep 26, 2025 • 272k • 6

  • Renjie-Ranger/FCP_big_math_pro_SFT

    Viewer • Updated Sep 26, 2025 • 384k • 19 • 1

  • Renjie-Ranger/FCP-Bootstrap_Qwen2.5-7B

    8B • Updated 5 days ago • 7

  • Renjie-Ranger/Base-GRPO_Qwen2.5-7B

    8B • Updated 5 days ago • 6

  • Renjie-Ranger/RFT-GRPO_Qwen2.5-7B

    8B • Updated 5 days ago • 4
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs