WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 8 days ago • 99
Gliner Guard v1 Collection GLiNER2-based guardrail for PII, content safety classification, prompt attacks detection and more via single forward pass • 5 items • Updated May 9 • 7
view article Article PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend PaddlePaddle • 28 days ago • 37
view article Article MTEB Leaderboard: From a slow demo to feature-rich leaderboard Samoed • 3 days ago • 21
SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents Paper • 2606.05761 • Published 12 days ago • 19
When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents Paper • 2606.05806 • Published 12 days ago • 23
LLM Explainability with Counterfactual Chains and Causal Graphs Paper • 2606.05972 • Published 12 days ago • 17
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs Paper • 2605.30611 • Published 19 days ago • 193
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation Paper • 2605.31264 • Published 18 days ago • 112
Trust-Region Behavior Blending for On-Policy Distillation Paper • 2605.31159 • Published 18 days ago • 66
JLT: Clean-Latent Prediction in Latent Diffusion Transformers Paper • 2605.27102 • Published 21 days ago • 33
Macaron-A2UI: A Model for Generative UI in Personal Agents Paper • 2605.24830 • Published 23 days ago • 82
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published 27 days ago • 110
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published 25 days ago • 230
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 20 days ago • 423
ResearchMath-14K: Scaling Research-Level Mathematics via Agents Paper • 2605.28003 • Published 20 days ago • 50
EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation Paper • 2605.23271 • Published 25 days ago • 80