-
MMLU-Pro Leaderboard
π₯245More advanced and challenging multi-task evaluation
-
Stick To Your Role! Leaderboard
π62Benchmarking LLMs on the stability of simulated populations
-
ZeroEval Leaderboard
π53Explore ZeroEval embedding benchmark online
-
Decentralized Arena Leaderboard
π₯26View and compare LLM evaluations across various domains
Hristo Panev
hppdqdq
AI & ML interests
None yet
Recent Activity
liked a model 5 days ago
Nimbz/Gemma-4-Gembrain-31B-GGUF liked a model about 1 month ago
froggeric/Qwen-Fixed-Chat-Templates liked a model 4 months ago
Comfy-Org/ace_step_1.5_ComfyUI_filesOrganizations
None yet