Sangsang/Olmo-3-7B-Instruct-SFT-ContextGRPOwDistill_2x4_eps20 Text Generation • Updated 3 days ago • 10
Sangsang/feedback_asymmetric_kl_fixed_ema_Qwen2.5-7B-Instruct_bw0p75_fw0p25_ema0p999_ep30 Text Generation • Updated 13 days ago • 11
Sangsang/feedback_asymmetric_kl_fixed_ema_Qwen2.5-7B-Instruct_bw0p25_fw0p75_ema0p999_ep30 Text Generation • Updated 13 days ago • 13
Sangsang/feedback_asymmetric_kl_fixed_ema_Llama-3.1-8B-Instruct_bw0p75_fw0p25_ema0p999_ep30 Text Generation • Updated 13 days ago • 14
Sangsang/feedback_asymmetric_kl_fixed_ema_Llama-3.1-8B-Instruct_bw0p25_fw0p75_ema0p999_ep30 Text Generation • Updated 13 days ago • 14
Sangsang/feedback_asymmetric_kl_fixed_ema_Qwen3-14B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 21 days ago • 13
Sangsang/grpo_Qwen3-0.6B_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated 24 days ago • 12
Sangsang/feedback_asymmetric_fixed_ema_Llama-3.1-8B-Instruct_bw0p5_fw0p5_ema0p999_ep30_v2 Text Generation • Updated 25 days ago • 21
Sangsang/grpo_Qwen3-1.7B_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated about 1 month ago
Sangsang/grpo_Qwen3-4B_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated Apr 4 • 7
Sangsang/feedback_asymmetric_fixed_ema_DeepSeek-R1-Distill-Qwen-7B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated Apr 3
Sangsang/feedback_asymmetric_fixed_ema_DeepSeek-R1-Distill-Llama-8B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated Apr 3 • 10