Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published 6 days ago • 74
SuperLocalMemory V3.3: The Living Brain -- Biologically-Inspired Forgetting, Cognitive Quantization, and Multi-Channel Retrieval for Zero-LLM Agent Memory Systems Paper • 2604.04514 • Published Apr 6 • 7
Map2World: Segment Map Conditioned Text to 3D World Generation Paper • 2605.00781 • Published 14 days ago • 25
GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers Paper • 2604.02648 • Published Apr 3 • 47
Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising Paper • 2604.26694 • Published 16 days ago • 6
Building a Precise Video Language with Human-AI Oversight Paper • 2604.21718 • Published 23 days ago • 16
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published 21 days ago • 226
CreativeGame:Toward Mechanic-Aware Creative Game Generation Paper • 2604.19926 • Published 24 days ago • 2
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2604.12374 • Published about 1 month ago • 36
Toward Autonomous Long-Horizon Engineering for ML Research Paper • 2604.13018 • Published about 1 month ago • 34
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents Paper • 2604.11784 • Published Apr 13 • 143
SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding Paper • 2604.09557 • Published Feb 10 • 11
Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs Paper • 2604.10480 • Published Apr 12 • 20
On Semiotic-Grounded Interpretive Evaluation of Generative Art Paper • 2604.08641 • Published Apr 9 • 4
Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariance Paper • 2604.01848 • Published Apr 3 • 5
Large Language Models Align with the Human Brain during Creative Thinking Paper • 2604.03480 • Published Apr 3 • 6
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory Paper • 2604.08995 • Published Apr 10 • 50