You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Hyper3-CLIP v0.5

Hyper3-CLIP v0.5 is an open-weight hyperbolic vision-language checkpoint from hyper³labs. It places image and text representations in a Lorentz space and was trained with compositional entailment constraints for hierarchy-sensitive image-text retrieval.

This v0.5 release is intended as an open baseline and research artifact.

Model

Architecture: ViT-B scale vision-language model
Vision backbone: vit_base_patch16_224
Text backbone: openai/clip-vit-base-patch32
Embedding dimension: 512
Training steps: 500,000
Global batch size: 768
Weights artifact: model.safetensors

The original full training checkpoint included optimizer, scheduler, AMP scaler, RNG state, config, and step metadata. This repository publishes the weights-only model.safetensors artifact for inference and downstream research.

Evaluation

The numbers below use the official evaluator convention for R@10. Higher is better except for TIE and LCA.

Model	Comparable setting	ImageNet top-1	COCO text R@10	COCO image R@10	Flickr text R@10	Flickr image R@10	TIE	LCA	Jaccard	H-Prec	H-Rec
MERU-B/16	same-family baseline	40.1	82.0	68.6	96.2	90.0	3.630	2.220	0.780	0.850	0.850
HyCoCLIP-B/16	official checkpoint	45.8	82.0	69.3	95.4	90.3	3.172	2.047	0.814	0.874	0.874
UNCHA-B/16	official checkpoint	48.8	82.6	71.0	95.9	91.2	2.945	1.961	0.828	0.883	0.884
PHyCLIP-B/16	related reported result	44.4	80.4	68.7	95.6	89.9	3.285	2.088	0.807	0.868	0.868
Hyper3-CLIP v0.5	this release	48.5	84.0	72.8	97.5	92.4	2.972	1.986	0.828	0.882	0.883

Raw evaluation files are included:

eval_coco_karpathy_final.json
eval_flickr30k_final.json
eval_imagenet_final.json
eval_hycoclip_uncha_intersection_final.json

License And Attribution

The model materials in this repository are released under OpenMDW-1.0. See LICENSE.

Redistributions should preserve NOTICE, LICENSE, and the original model card when practical. Modified or derived checkpoints should use a distinct name and must not imply endorsement by hyper³labs.

Please cite and link to the original hyper³labs model repository when publishing benchmarks, papers, derivative checkpoints, or public demos based on this model.

Intended Use

This release is intended for:

hierarchy-sensitive image-text retrieval research
zero-shot and retrieval evaluation
multimodal embedding baselines
downstream experiments with hyperbolic representation learning

This model has not been validated for safety-critical use.

Citation

If you use Hyper3-CLIP v0.5, cite the original model repository and hyper³labs.

Downloads last month: 55

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for hyper3labs/hyper3-clip-v0.5

Base model

openai/clip-vit-base-patch32

Finetuned

(120)

this model

hyper3labs
/

hyper3-clip-v0.5