Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning Paper • 2605.06326 • Published 8 days ago • 24
Monthly-SWEBench Collection A continuously updated benchmark evaluating AI coding agents on real-world software engineering tasks from GitHub issues. • 2 items • Updated 2 days ago • 1
Monthly-SWEBench Collection A continuously updated benchmark evaluating AI coding agents on real-world software engineering tasks from GitHub issues. • 2 items • Updated 2 days ago • 1
MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning Paper • 2603.03379 • Published Mar 3 • 32
BabyVision Collection State-of-the-art MLLMs achieve PhD-level language reasoning but struggle with visual tasks that 3-year-olds solve effortlessly. • 2 items • Updated Jan 10 • 4
BabyVision Collection State-of-the-art MLLMs achieve PhD-level language reasoning but struggle with visual tasks that 3-year-olds solve effortlessly. • 2 items • Updated Jan 10 • 4