IndexTTS-1.5-vLLM

This is the IndexTTS-1.5 model converted for vLLM acceleration.

Model Description

IndexTTS-1.5 is a high-quality text-to-speech model optimized with vLLM for fast inference.

Performance

  • RTF (Real-Time Factor): ~0.10 (10x faster than real-time)
  • Peak Throughput: 8.49 requests/second at 96 concurrent requests
  • Concurrency: Supports 128+ concurrent requests
  • Speedup: 52x faster than real-time at peak load
  • GPU: Tested on L40S with 75% memory utilization

Usage

Installation

pip install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install vllm==0.10.2
pip install -r requirements.txt

Running the Server

python api_server.py \
  --model_dir /path/to/IndexTTS-1.5-vLLM \
  --host 0.0.0.0 \
  --port 6006 \
  --gpu_memory_utilization 0.75

API Example

import requests

url = "http://localhost:6006/tts_url"
data = {
    "text": "Hello, this is IndexTTS speaking!",
    "audio_paths": ["reference_audio.wav"]
}

response = requests.post(url, json=data)
with open("output.wav", "wb") as f:
    f.write(response.content)

Model Details

  • Base Model: IndexTTS-1.5
  • Framework: vLLM 0.10.2
  • PyTorch Version: 2.8.0+cu126
  • CUDA Version: 12.6
  • KV Cache: 283,200 tokens (75% GPU utilization)
  • Max Sequence Length: 803 tokens

Original Model

Based on IndexTeam/IndexTTS-1.5

Citation

If you use this model, please cite:

@misc{indextts-vllm,
  title={IndexTTS-1.5 with vLLM Acceleration},
  author={Original: IndexTeam, vLLM conversion: Community},
  year={2024},
  howpublished={\url{https://huggingface.co/Hariprasath28/IndexTTS-1.5-vLLM}}
}

License

MIT License

Performance Benchmarks

Single Request

  • RTF: ~0.10 (10x faster than real-time)

High Concurrency (75% GPU)

Concurrent Throughput RTF Speedup
16 8.18 req/s 0.225 50.2x
32 8.23 req/s 0.382 50.5x
64 8.39 req/s 0.683 51.5x
96 8.49 req/s 0.979 52.1x
128 8.45 req/s 1.298 51.8x

Repository

Original vLLM implementation: index-tts-vllm

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support