Inference Providers documentation
Hub API
Hub API
The Hub provides a few APIs to interact with Inference Providers. Here is a list of them:
List models
To list models powered by a provider, use the inference_provider query parameter:
# List all models served by Fireworks AI
~ curl -s https://huggingface.co/api/models?inference_provider=fireworks-ai | jq ".[].id"
"deepseek-ai/DeepSeek-V3-0324"
"deepseek-ai/DeepSeek-R1"
"Qwen/QwQ-32B"
"deepseek-ai/DeepSeek-V3"
...It can be combined with other filters to e.g. select only text-to-image models:
# List text-to-image models served by Fal AI
~ curl -s https://huggingface.co/api/models?inference_provider=fal-ai&pipeline_tag=text-to-image | jq ".[].id"
"black-forest-labs/FLUX.1-dev"
"stabilityai/stable-diffusion-3.5-large"
"black-forest-labs/FLUX.1-schnell"
"stabilityai/stable-diffusion-3.5-large-turbo"
...Pass a comma-separated list of providers to select multiple:
# List image-text-to-text models served by Novita or Sambanova
~ curl -s https://huggingface.co/api/models?inference_provider=sambanova,novita&pipeline_tag=image-text-to-text | jq ".[].id"
"meta-llama/Llama-3.2-11B-Vision-Instruct"
"meta-llama/Llama-3.2-90B-Vision-Instruct"
"Qwen/Qwen2-VL-72B-Instruct"Finally, you can select all models served by at least one inference provider:
# List text-to-video models served by any provider
~ curl -s https://huggingface.co/api/models?inference_provider=all&pipeline_tag=text-to-video | jq ".[].id"
"Wan-AI/Wan2.1-T2V-14B"
"Lightricks/LTX-Video"
"tencent/HunyuanVideo"
"Wan-AI/Wan2.1-T2V-1.3B"
"THUDM/CogVideoX-5b"
"genmo/mochi-1-preview"
"BagOu22/Lora_HKLPAZ"Get model status
To find an inference provider for a specific model, request the inference attribute in the model info endpoint:
In the huggingface_hub, use model_info with the expand parameter:
>>> from huggingface_hub import model_info
>>> info = model_info("google/gemma-3-27b-it", expand="inference")
>>> info.inference
'warm'Inference status is either “warm” or undefined:
In the huggingface_hub, use model_info with the expand parameter:
>>> from huggingface_hub import model_info
>>> info = model_info("manycore-research/SpatialLM-Llama-1B", expand="inference")
>>> info.inference
NoneList OpenAI-compatible models
The router exposes an OpenAI-compatible endpoint to list chat-completion models served by Inference Providers, together with provider metadata used for routing and comparison:
~ curl -s https://router.huggingface.co/v1/models | jq '.data'
{
"id": "deepseek-ai/DeepSeek-V4-Pro",
"object": "model",
"created": 1776837885,
"owned_by": "deepseek-ai",
"architecture": {
"input_modalities": [
"text"
],
"output_modalities": [
"text"
]
},
"providers": [
{
"provider": "novita",
"status": "live",
"context_length": 1048576,
"pricing": {
"input": 1.69,
"output": 3.38
},
"supports_tools": true,
"supports_structured_output": false,
"first_token_latency_ms": 1490,
"throughput": 24.69124008437934,
"is_model_author": false
},
{
"provider": "together",
"status": "live",
"context_length": 512000,
"pricing": {
"input": 2.1,
"output": 4.4
},
"supports_tools": true,
"supports_structured_output": true,
"first_token_latency_ms": 611,
"throughput": 40.13113557470821,
"is_model_author": false
},
{
"provider": "fireworks-ai",
"status": "live",
"context_length": 1048576,
"supports_tools": true,
"supports_structured_output": true,
"first_token_latency_ms": 588.4,
"throughput": 45.42948747753918,
"is_model_author": false
},
{
"provider": "featherless-ai",
"status": "live",
"is_model_author": false
},
{
"provider": "deepinfra",
"status": "live",
"context_length": 65536,
"pricing": {
"input": 1.74,
"output": 3.48
},
"supports_tools": true,
"supports_structured_output": true,
"first_token_latency_ms": 525.8,
"throughput": 34.615908579189465,
"is_model_author": false
}
]
}
...To retrieve a single model, append its model id to the endpoint:
~ curl -s https://router.huggingface.co/v1/models/deepseek-ai/DeepSeek-V4-Pro | jq '.'Each provider entry may include the following fields:
| Field | Type | Description |
|---|---|---|
provider | string | Provider identifier |
status | string | live or error |
context_length | number | Maximum context length supported by this provider for the model, when available |
pricing | object | input and output prices in USD per million tokens, when available |
supports_tools | boolean | Whether the provider supports tool calling, when available |
supports_structured_output | boolean | Whether the provider supports structured output, when available |
first_token_latency_ms | number | Time to first token in milliseconds from the latest validation probe, when available |
throughput | number | Output throughput in tokens per second from the latest validation probe, when available |
is_model_author | boolean | Whether the model was published by this provider |
These metrics are the same provider performance signals shown in the provider comparison table. They are optional because some providers or models may not have the latest probe data available.
Get model providers
If you are interested by a specific model and want to check the list of providers serving it, you can request the inferenceProviderMapping attribute in the model info endpoint:
In the huggingface_hub, use model_info with the expand parameter:
>>> from huggingface_hub import model_info
>>> info = model_info("google/gemma-3-27b-it", expand="inferenceProviderMapping")
>>> info.inference_provider_mapping
{
'featherless-ai': InferenceProviderMapping(status='live', provider_id='google/gemma-3-27b-it', task='conversational'),
'scaleway': InferenceProviderMapping(status='live', provider_id='google/gemma-3-27b-it-fast', task='conversational'),
}Each provider serving the model shows a status (staging or live), the related task (here, conversational) and the providerId. In practice, this information is relevant for the JS and Python clients.