AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 2 days ago • 30
Advancing Creative Physical Intelligence in Large Multimodal Models Paper • 2605.26396 • Published 12 days ago • 19
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing Paper • 2605.02910 • Published about 1 month ago • 22
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published Apr 22 • 243
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 352
NarrativeTrack: Evaluating Video Language Models Beyond the Frame Paper • 2601.01095 • Published Jan 3 • 8
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks Paper • 2502.17832 • Published Feb 25, 2025 • 6
EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities Paper • 2510.27545 • Published Oct 31, 2025 • 51
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29, 2025 • 231
Multimodal Policy Internalization for Conversational Agents Paper • 2510.09474 • Published Oct 10, 2025 • 5
Where LLM Agents Fail and How They can Learn From Failures Paper • 2509.25370 • Published Sep 29, 2025 • 12
FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games Paper • 2509.01052 • Published Sep 1, 2025 • 22
Perception-Aware Policy Optimization for Multimodal Reasoning Paper • 2507.06448 • Published Jul 8, 2025 • 48
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published Jul 2, 2025 • 70
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published May 14, 2025 • 99
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs Paper • 2504.17040 • Published Apr 23, 2025 • 13
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published Mar 3, 2025 • 30