CU-Benchmarks
updated
visualwebbench/VisualWebBench
Viewer
•
Updated
•
1.54k
•
829
•
18
Updated
•
82
•
6
rootsautomation/RICO-ScreenQA
Viewer
•
Updated
•
86k
•
329
•
10
rootsautomation/ScreenSpot
Viewer
•
Updated
•
1.27k
•
1.91k
•
43
Viewer
•
Updated
•
1.27k
•
367
•
7
Viewer
•
Updated
•
1.59k
•
3.3k
•
42
Preview
•
Updated
•
1.69k
•
15
Preview
•
Updated
•
2.38k
•
25
Viewer
•
Updated
•
168k
•
325
•
5
Preview
•
Updated
•
7
osunlp/Multimodal-Mind2Web
Viewer
•
Updated
•
14.2k
•
3.51k
•
88
Viewer
•
Updated
•
259
•
69
•
2
Viewer
•
Updated
•
253
•
1.52k
•
115
Viewer
•
Updated
•
7.74k
•
7.63k
•
26
xlangai/ubuntu_osworld_file_cache
Updated
•
336k
•
2
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper
•
2409.08264
•
Published
•
48
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Paper
•
2405.14573
•
Published
Viewer
•
Updated
•
1.21k
•
79
•
5