UI2Code$^\text{N}$: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation Paper • 2511.08195 • Published 29 days ago • 31 • 4
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 242 • 4
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web Paper • 2402.17553 • Published Feb 27, 2024 • 25 • 6
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web Paper • 2402.17553 • Published Feb 27, 2024 • 25 • 6