You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Hyper3-CLIP v0.5

Hyper3-CLIP v0.5 is an open-weight hyperbolic vision-language checkpoint from hyper³labs. It places image and text representations in a Lorentz space and was trained with compositional entailment constraints for hierarchy-sensitive image-text retrieval.

This v0.5 release is intended as an open baseline and research artifact.

Model

  • Architecture: ViT-B scale vision-language model
  • Vision backbone: vit_base_patch16_224
  • Text backbone: openai/clip-vit-base-patch32
  • Embedding dimension: 512
  • Training steps: 500,000
  • Global batch size: 768
  • Weights artifact: model.safetensors

The original full training checkpoint included optimizer, scheduler, AMP scaler, RNG state, config, and step metadata. This repository publishes the weights-only model.safetensors artifact for inference and downstream research.

Evaluation

The numbers below use the official evaluator convention for R@10. Higher is better except for TIE and LCA.

Model Comparable setting ImageNet top-1 COCO text R@10 COCO image R@10 Flickr text R@10 Flickr image R@10 TIE LCA Jaccard H-Prec H-Rec
MERU-B/16 same-family baseline 40.1 82.0 68.6 96.2 90.0 3.630 2.220 0.780 0.850 0.850
HyCoCLIP-B/16 official checkpoint 45.8 82.0 69.3 95.4 90.3 3.172 2.047 0.814 0.874 0.874
UNCHA-B/16 official checkpoint 48.8 82.6 71.0 95.9 91.2 2.945 1.961 0.828 0.883 0.884
PHyCLIP-B/16 related reported result 44.4 80.4 68.7 95.6 89.9 3.285 2.088 0.807 0.868 0.868
Hyper3-CLIP v0.5 this release 48.5 84.0 72.8 97.5 92.4 2.972 1.986 0.828 0.882 0.883

Raw evaluation files are included:

  • eval_coco_karpathy_final.json
  • eval_flickr30k_final.json
  • eval_imagenet_final.json
  • eval_hycoclip_uncha_intersection_final.json

License And Attribution

The model materials in this repository are released under OpenMDW-1.0. See LICENSE.

Redistributions should preserve NOTICE, LICENSE, and the original model card when practical. Modified or derived checkpoints should use a distinct name and must not imply endorsement by hyper³labs.

Please cite and link to the original hyper³labs model repository when publishing benchmarks, papers, derivative checkpoints, or public demos based on this model.

Intended Use

This release is intended for:

  • hierarchy-sensitive image-text retrieval research
  • zero-shot and retrieval evaluation
  • multimodal embedding baselines
  • downstream experiments with hyperbolic representation learning

This model has not been validated for safety-critical use.

Citation

If you use Hyper3-CLIP v0.5, cite the original model repository and hyper³labs.

Downloads last month
55
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hyper3labs/hyper3-clip-v0.5

Finetuned
(120)
this model

Spaces using hyper3labs/hyper3-clip-v0.5 3