4 565

M Saad Salman

MSS444

MSS444

AI & ML interests

None yet

Recent Activity

upvoted a paper about 6 hours ago

Reward Hacking in Rubric-Based Reinforcement Learning

upvoted a paper about 6 hours ago

Do not copy and paste! Rewriting strategies for code retrieval

upvoted a paper about 6 hours ago

Continual Harness: Online Adaptation for Self-Improving Foundation Agents

View all activity

Organizations

None yet

upvoted 5 papers about 6 hours ago

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

Paper • 2605.12483 • Published 1 day ago • 7

Teaching Language Models to Think in Code

Paper • 2605.07237 • Published 3 days ago • 17

upvoted a paper 2 days ago

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Paper • 2605.08083 • Published 6 days ago • 60

upvoted 5 papers 5 days ago

AI Co-Mathematician: Accelerating Mathematicians with Agentic AI

Paper • 2605.06651 • Published 7 days ago • 13

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

Paper • 2605.06130 • Published 7 days ago • 94

SkillOS: Learning Skill Curation for Self-Evolving Agents

Paper • 2605.06614 • Published 7 days ago • 38

Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes

Paper • 2605.05724 • Published 7 days ago • 14

Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration

Paper • 2605.05566 • Published 7 days ago • 35

upvoted 9 papers 9 days ago

Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding

Paper • 2604.26779 • Published 15 days ago • 13

Large Language Models Explore by Latent Distilling

Paper • 2604.24927 • Published 17 days ago • 74

Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization

Paper • 2604.24952 • Published 17 days ago • 6

Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models

Paper • 2604.27251 • Published 15 days ago • 8

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

Paper • 2604.24954 • Published 17 days ago • 22

Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

Paper • 2604.27039 • Published 15 days ago • 24

Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists

Paper • 2604.28158 • Published 14 days ago • 47

Efficient Training on Multiple Consumer GPUs with RoundPipe

Paper • 2604.27085 • Published 15 days ago • 40

Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 15 days ago • 64

M Saad Salman

AI & ML interests

Recent Activity

Organizations

MSS444's activity