Joakim Lee's picture

Joakim Lee

Reinforcement4All

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 10 hours ago

s2n-bignum-bench: A practical benchmark for evaluating low-level code reasoning of LLMs

upvoted a paper about 10 hours ago

Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL

upvoted a paper about 10 hours ago

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

View all activity

Organizations

None yet

upvoted 20 papers about 10 hours ago

s2n-bignum-bench: A practical benchmark for evaluating low-level code reasoning of LLMs

Paper • 2603.14628 • Published 12 days ago • 3

Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL

Paper • 2603.19470 • Published 8 days ago • 3

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

Paper • 2603.19209 • Published 8 days ago • 5

AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science

Paper • 2603.19005 • Published 8 days ago • 6

Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

Paper • 2603.20155 • Published 7 days ago • 8

Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States

Paper • 2603.19987 • Published 7 days ago • 9

EgoForge: Goal-Directed Egocentric World Simulator

Paper • 2603.20169 • Published 7 days ago • 9

WorldAgents: Can Foundation Image Models be Agents for 3D World Models?

Paper • 2603.19708 • Published 8 days ago • 12

LoopRPT: Reinforcement Pre-Training for Looped Language Models

Paper • 2603.19714 • Published 8 days ago • 13

A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

Paper • 2603.19685 • Published 8 days ago • 18

Hyperagents

Paper • 2603.19461 • Published 8 days ago • 35

ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models

Paper • 2603.19466 • Published 8 days ago • 39

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Paper • 2603.17051 • Published 10 days ago • 103

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

Paper • 2603.17024 • Published 10 days ago • 104

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Paper • 2603.18886 • Published 8 days ago • 6

OSM-based Domain Adaptation for Remote Sensing VLMs

Paper • 2603.11804 • Published 16 days ago • 7

MOSS-TTS Technical Report

Paper • 2603.18090 • Published 10 days ago • 10

MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning

Paper • 2603.16929 • Published 14 days ago • 12

ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents

Paper • 2603.18815 • Published 8 days ago • 14

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Paper • 2603.15030 • Published 12 days ago • 21