Joel Wang's picture

Joel Wang

joelhenwang

·

joelhenwang

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

δ-mem: Efficient Online Memory for Large Language Models

upvoted a paper 3 days ago

Process Rewards with Learned Reliability

upvoted a paper 3 days ago

Co-Evolving Policy Distillation

View all activity

Organizations

upvoted 20 papers 3 days ago

δ-mem: Efficient Online Memory for Large Language Models

Paper • 2605.12357 • Published 12 days ago • 120

Process Rewards with Learned Reliability

Paper • 2605.15529 • Published 9 days ago • 51

Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 25 days ago • 65

Self-Distilled Agentic Reinforcement Learning

Paper • 2605.15155 • Published 10 days ago • 108

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Paper • 2605.11609 • Published 12 days ago • 189

Learning to Discover at Test Time

Paper • 2601.16175 • Published Jan 22 • 45

Reinforcement Learning via Self-Distillation

Paper • 2601.20802 • Published Jan 28 • 47

Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting

Paper • 2601.02151 • Published Jan 5 • 115

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

Paper • 2601.08763 • Published Jan 13 • 150

Improving Data and Reward Design for Scientific Reasoning in Large Language Models

Paper • 2602.08321 • Published Feb 9 • 44

SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning

Paper • 2602.13515 • Published Feb 13 • 45

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Paper • 2602.01058 • Published Feb 1 • 45

Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

Paper • 2602.05261 • Published Feb 5 • 53

BitDance: Scaling Autoregressive Generative Models with Binary Tokens

Paper • 2602.14041 • Published Feb 15 • 54

HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation

Paper • 2602.18283 • Published Feb 20 • 57

SLA2: Sparse-Linear Attention with Learnable Routing and QAT

Paper • 2602.12675 • Published Feb 13 • 59

Experiential Reinforcement Learning

Paper • 2602.13949 • Published Feb 15 • 75

F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare

Paper • 2602.06717 • Published Feb 6 • 75

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

Paper • 2602.12036 • Published Feb 12 • 94

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Paper • 2601.22975 • Published Jan 30 • 112