FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation Paper • 2601.13976 • Published 28 days ago • 21
Running on Zero Featured 827 Florence 2 📉 827 Generate captions, detections, and segmentations from images