IndexTTS-1.5-vLLM

This is the IndexTTS-1.5 model converted for vLLM acceleration.

Model Description

IndexTTS-1.5 is a high-quality text-to-speech model optimized with vLLM for fast inference.

Performance

RTF (Real-Time Factor): ~0.10 (10x faster than real-time)
Peak Throughput: 8.49 requests/second at 96 concurrent requests
Concurrency: Supports 128+ concurrent requests
Speedup: 52x faster than real-time at peak load
GPU: Tested on L40S with 75% memory utilization

Usage

Installation

pip install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install vllm==0.10.2
pip install -r requirements.txt

Running the Server

python api_server.py \
  --model_dir /path/to/IndexTTS-1.5-vLLM \
  --host 0.0.0.0 \
  --port 6006 \
  --gpu_memory_utilization 0.75

API Example

import requests

url = "http://localhost:6006/tts_url"
data = {
    "text": "Hello, this is IndexTTS speaking!",
    "audio_paths": ["reference_audio.wav"]
}

response = requests.post(url, json=data)
with open("output.wav", "wb") as f:
    f.write(response.content)

Model Details

Base Model: IndexTTS-1.5
Framework: vLLM 0.10.2
PyTorch Version: 2.8.0+cu126
CUDA Version: 12.6
KV Cache: 283,200 tokens (75% GPU utilization)
Max Sequence Length: 803 tokens

Original Model

Based on IndexTeam/IndexTTS-1.5

Citation

If you use this model, please cite:

@misc{indextts-vllm,
  title={IndexTTS-1.5 with vLLM Acceleration},
  author={Original: IndexTeam, vLLM conversion: Community},
  year={2024},
  howpublished={\url{https://huggingface.co/Hariprasath28/IndexTTS-1.5-vLLM}}
}

License

MIT License

Performance Benchmarks

Single Request

RTF: ~0.10 (10x faster than real-time)

High Concurrency (75% GPU)

Concurrent	Throughput	RTF	Speedup
16	8.18 req/s	0.225	50.2x
32	8.23 req/s	0.382	50.5x
64	8.39 req/s	0.683	51.5x
96	8.49 req/s	0.979	52.1x
128	8.45 req/s	1.298	51.8x

Repository

Original vLLM implementation: index-tts-vllm

Downloads last month: 5