MentalBench: A Benchmark for Evaluating Psychiatric Diagnostic Capability of Large Language Models Paper • 2602.12871 • Published Feb 13 • 17
MolmoAct2: Action Reasoning Models for Real-world Deployment Paper • 2605.02881 • Published 20 days ago • 335