MultiShotMaster: A Controllable Multi-Shot Video Generation Framework Paper • 2512.03041 • Published 7 days ago • 60
Open Multimodal Retrieval-Augmented Factual Image Generation Paper • 2510.22521 • Published Oct 26 • 30
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Paper • 2507.05255 • Published Jul 7 • 74
Learning Getting-Up Policies for Real-World Humanoid Robots Paper • 2502.12152 • Published Feb 17 • 42
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3, 2024 • 83
Focus Anywhere for Fine-grained Multi-page Document Understanding Paper • 2405.14295 • Published May 23, 2024 • 1
Configuration error Featured 359 GOT Online 💬 359 Extract text from images using various OCR modes
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3, 2024 • 83
Merlin:Empowering Multimodal LLMs with Foresight Minds Paper • 2312.00589 • Published Nov 30, 2023 • 27
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models Paper • 2312.06109 • Published Dec 11, 2023 • 21
OneChart: Purify the Chart Structural Extraction via One Auxiliary Token Paper • 2404.09987 • Published Apr 15, 2024 • 2
OneChart: Purify the Chart Structural Extraction via One Auxiliary Token Paper • 2404.09987 • Published Apr 15, 2024 • 2