4 7 1

Peter Kruger PRO

PeterKruger

http://pwk.it

AI & ML interests

Neural networks (since 1993), LLMs, AI-based financial analysis, LLM Benchmarks

Recent Activity

upvoted an article 2 days ago

Announcing AutoBench Agentic: The Next Generation Agentic Benchmark.

published an article 2 days ago

Announcing AutoBench Agentic: The Next Generation Agentic Benchmark.

updated a Space 3 days ago

AutoBench/AutoBench-Leaderboard

View all activity

Organizations

upvoted an article 2 days ago

Article

Announcing AutoBench Agentic: The Next Generation Agentic Benchmark.

2 days ago

•

published an article 2 days ago

Article

Announcing AutoBench Agentic: The Next Generation Agentic Benchmark.

2 days ago

•

updated a Space 3 days ago

AutoBench Leaderboard

👀

Multi-run AutoBench leaderboard with historical navigation

upvoted an article 4 months ago

Article

Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.

Dec 17, 2025

•

published an article 4 months ago

Article

Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.

Dec 17, 2025

•

upvoted an article 4 months ago

Article

AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral...

Dec 10, 2025

•

published an article 4 months ago

Article

AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral...

Dec 10, 2025

•

upvoted an article 5 months ago

Article

AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect.

Nov 28, 2025

•

published an article 5 months ago

Article

AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect.

Nov 28, 2025

•

New activity in AutoBench/AutoBench-Leaderboard 6 months ago

Thanks for your leaderboard :)

🚀❤️ 1

#1 opened 6 months ago by

zhiminy

upvoted an article 6 months ago

Article

AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark

Oct 29, 2025

•

published an article 6 months ago

Article

AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark

Oct 29, 2025

•

published an article 8 months ago

Article

AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org

Aug 20, 2025

•

published an article 11 months ago

Article

Introducing Bot Scanner: A "Skyscanner" for LLM answers

Jun 4, 2025

published an article 12 months ago

Article

AutoBench Run 2 Results are Out! Surprise: Gemini 2.5 Pro is not the Best Affordable Thinking Model

Apr 29, 2025

•

published a Space 12 months ago

AutoBench Leaderboard

👀

Multi-run AutoBench leaderboard with historical navigation

upvoted an article 12 months ago

Article

Announcing MamayLM, an efficient state-of-the-art Ukrainian LLM

Apr 23, 2025

•

updated a Space about 1 year ago

README

😻

updated a model about 1 year ago

AutoBench/AutoBench_1.0

Updated Mar 7, 2025 • 2

commented on Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!) about 1 year ago

Nice and fully accurate. Excellent job. Thanks!

Peter Kruger PRO

AI & ML interests

Recent Activity

Organizations

PeterKruger's activity

Announcing AutoBench Agentic: The Next Generation Agentic Benchmark.

Announcing AutoBench Agentic: The Next Generation Agentic Benchmark.

AutoBench Leaderboard

Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.

Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.

AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral...

AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral...

AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect.

AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect.

Thanks for your leaderboard :)

AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark

AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark

AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org

Introducing Bot Scanner: A "Skyscanner" for LLM answers

AutoBench Run 2 Results are Out! Surprise: Gemini 2.5 Pro is not the Best Affordable Thinking Model

AutoBench Leaderboard

Announcing MamayLM, an efficient state-of-the-art Ukrainian LLM

README