view article Article Announcing AutoBench Agentic: The Next Generation Agentic Benchmark. 2 days ago • 2
view article Article Announcing AutoBench Agentic: The Next Generation Agentic Benchmark. 2 days ago • 2
Running Agents 3 AutoBench Leaderboard 👀 3 Multi-run AutoBench leaderboard with historical navigation
view article Article Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2. Dec 17, 2025 • 1
view article Article Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2. Dec 17, 2025 • 1
view article Article AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral... Dec 10, 2025 • 3
view article Article AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral... Dec 10, 2025 • 3
view article Article AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect. Nov 28, 2025 • 1
view article Article AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect. Nov 28, 2025 • 1
view article Article AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark Oct 29, 2025 • 4
view article Article AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark Oct 29, 2025 • 4
view article Article AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org Aug 20, 2025 • 6
view article Article AutoBench Run 2 Results are Out! Surprise: Gemini 2.5 Pro is not the Best Affordable Thinking Model Apr 29, 2025 • 6
Running Agents 3 AutoBench Leaderboard 👀 3 Multi-run AutoBench leaderboard with historical navigation
view article Article Announcing MamayLM, an efficient state-of-the-art Ukrainian LLM Apr 23, 2025 • 64