Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment Paper • 2605.08064 • Published 11 days ago • 1