XTTS French Voice Cloning - Optimized Model

Modèle XTTS v2 optimisé pour le clonage de voix en français.

Résultats d'Entraînement

Métrique	Valeur
Epochs	100
Batch Size	4
Learning Rate	2e-6
Loss Initiale	2.221
Loss Finale	0.0166
Amélioration	99.25% (133x réduction)
Dataset	16 fichiers audio (~42 minutes)

Comparaison avec Baseline

Ce modèle est 6x plus performant que le baseline (50 epochs, loss 0.102).

Modèle	Loss Finale	Amélioration
Baseline	0.102	-
Optimisé	0.0166	83.7% meilleur

Utilisation

import torch
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

# Charger la configuration
config = XttsConfig()
config.load_json("config.json")

# Charger le modèle
model = Xtts.init_from_config(config)
model.load_checkpoint(
    config,
    checkpoint_path="best_model.pth",
    vocab_path="vocab.json",
    eval=True,
    use_deepspeed=False
)

# Générer de l'audio
if torch.cuda.is_available():
    model.cuda()

outputs = model.synthesize(
    "Votre texte ici",
    config,
    speaker_wav="path/to/reference_audio.wav",
    language="fr",
    gpt_cond_len=6,
    temperature=0.75,
)

Configuration Matérielle

GPU: NVIDIA RTX 4090 (24 GB VRAM)
CPU: Intel Xeon (32 cores)
RAM: 62 GB

Environnement

PyTorch: 2.5.1+cu118
TTS: 0.22.0
CUDA: 11.8
Python: 3.11

Fichiers Inclus

best_model.pth (5.3 GB) - Poids du modèle optimisé
config.json - Configuration du modèle
vocab.json - Vocabulaire du tokenizer
dvae.pth - DVAE checkpoint
mel_stats.pth - Statistiques de normalisation

License

Apache 2.0

Downloads last month: 33