Hyper3-CLIP v0.5
Hyper3-CLIP v0.5 is an open-weight hyperbolic vision-language checkpoint from hyper³labs. It places image and text representations in a Lorentz space and was trained with compositional entailment constraints for hierarchy-sensitive image-text retrieval.
This v0.5 release is intended as an open baseline and research artifact.
Model
- Architecture: ViT-B scale vision-language model
- Vision backbone:
vit_base_patch16_224 - Text backbone:
openai/clip-vit-base-patch32 - Embedding dimension: 512
- Training steps: 500,000
- Global batch size: 768
- Weights artifact:
model.safetensors
The original full training checkpoint included optimizer, scheduler, AMP scaler,
RNG state, config, and step metadata. This repository publishes the weights-only
model.safetensors artifact for inference and downstream research.
Evaluation
The numbers below use the official evaluator convention for R@10. Higher is better except for TIE and LCA.
| Model | Comparable setting | ImageNet top-1 | COCO text R@10 | COCO image R@10 | Flickr text R@10 | Flickr image R@10 | TIE | LCA | Jaccard | H-Prec | H-Rec |
|---|---|---|---|---|---|---|---|---|---|---|---|
| MERU-B/16 | same-family baseline | 40.1 | 82.0 | 68.6 | 96.2 | 90.0 | 3.630 | 2.220 | 0.780 | 0.850 | 0.850 |
| HyCoCLIP-B/16 | official checkpoint | 45.8 | 82.0 | 69.3 | 95.4 | 90.3 | 3.172 | 2.047 | 0.814 | 0.874 | 0.874 |
| UNCHA-B/16 | official checkpoint | 48.8 | 82.6 | 71.0 | 95.9 | 91.2 | 2.945 | 1.961 | 0.828 | 0.883 | 0.884 |
| PHyCLIP-B/16 | related reported result | 44.4 | 80.4 | 68.7 | 95.6 | 89.9 | 3.285 | 2.088 | 0.807 | 0.868 | 0.868 |
| Hyper3-CLIP v0.5 | this release | 48.5 | 84.0 | 72.8 | 97.5 | 92.4 | 2.972 | 1.986 | 0.828 | 0.882 | 0.883 |
Raw evaluation files are included:
eval_coco_karpathy_final.jsoneval_flickr30k_final.jsoneval_imagenet_final.jsoneval_hycoclip_uncha_intersection_final.json
License And Attribution
The model materials in this repository are released under OpenMDW-1.0. See
LICENSE.
Redistributions should preserve NOTICE, LICENSE, and the original model card
when practical. Modified or derived checkpoints should use a distinct name and
must not imply endorsement by hyper³labs.
Please cite and link to the original hyper³labs model repository when publishing benchmarks, papers, derivative checkpoints, or public demos based on this model.
Intended Use
This release is intended for:
- hierarchy-sensitive image-text retrieval research
- zero-shot and retrieval evaluation
- multimodal embedding baselines
- downstream experiments with hyperbolic representation learning
This model has not been validated for safety-critical use.
Citation
If you use Hyper3-CLIP v0.5, cite the original model repository and hyper³labs.
- Downloads last month
- 55
Model tree for hyper3labs/hyper3-clip-v0.5
Base model
openai/clip-vit-base-patch32