Image Labels: One-shot image-conditioned object detection

by godaspeg - opened Jun 26, 2024

Jun 26, 2024

Is it possible to detect objects using images as labels instead of texts? As OwlVIT is based on CLIP Embeddings, I think this should be theoretically possible.

godaspeg changed discussion title from Image Labels to Image Labels: One-shot image-conditioned object detection Jun 26, 2024

nielsr

Jun 27, 2024

Yes, image-guided object detection is supported, see the demo notebook: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/OWLv2/Zero_and_one_shot_object_detection_with_OWLv2.ipynb

godaspeg changed discussion status to closed Oct 12, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment