T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning Paper • 2603.03790 • Published 2 days ago • 105
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools Paper • 2509.09734 • Published Sep 10, 2025 • 16 • 4