Fish Speech - Luxembourgish TTS

Fine-tuned Fish Speech Dual-AR model for Luxembourgish text-to-speech.

Model Details

  • Base Model: fishaudio/openaudio-s1-mini
  • Architecture: Dual-AR Transformer (860M parameters)
  • Language: Luxembourgish (lb)
  • Training Data: 32,000 samples from male Luxembourgish speaker
  • Training Steps: 9,000 steps (~2.4 epochs)
  • Fine-tuned on: NVIDIA RTX 5090

Usage

Requires Fish Speech installed.

# WebUI
python tools/run_webui.py \
    --llama-checkpoint-path vivienhenz/fish-speech-luxembourgish \
    --decoder-checkpoint-path fishaudio/openaudio-s1-mini/codec.pth

Training Details

  • Dataset: 32,000 male voice samples (~28 hours)
  • Optimizer: AdamW (lr=1e-4)
  • Precision: bf16-mixed
  • Training time: ~3 hours on RTX 5090

Example

Input: d'nottär huet haut de mueren zwou venten.

Output: Natural Luxembourgish male voice

License

CC-BY-NC-SA-4.0 (inherited from Fish Speech)

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vivienhenz/fish-speech-luxembourgish

Finetuned
(1)
this model

Space using vivienhenz/fish-speech-luxembourgish 1