arxiv:2602.08222

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Published on Feb 9

· Submitted by

Yikun B on Feb 10

#2 Paper of the day

Upvote

158

Authors:

Zehao Chen ,

Yikun Ban

Abstract

WMSS is a post-training paradigm that uses weak model checkpoints to identify and fill learning gaps, enabling continued improvement beyond conventional saturation points in large language models.

AI-generated summary

As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields diminishing returns. While existing methods continue to reinforce target predictions, we find that informative supervision signals remain latent in models' own historical weak states. Motivated by this observation, we propose WMSS (Weak Agents Can Make Strong Agents Stronger), a post-training paradigm that leverages weak checkpoints to guide continued optimization. By identifying recoverable learning gaps via entropy dynamics and reinforcing them through compensatory learning, WMSS enables strong agents to improve beyond conventional post-training saturation. Experiments on mathematical reasoning and code generation datasets show that agents trained with our approach achieve effective performance improvements, while incurring zero additional inference cost.

View arXiv page View PDF GitHub 48 Add to collection

Community

Yikunb

Paper author Paper submitter about 22 hours ago

Weak-Driven Learning refers to a class of post-training paradigms in which the improvement of a strong model is driven by systematic discrepancies between its predictions and those of a weaker reference model (e.g., a historical checkpoint), rather than by imitation of a stronger teacher.