--- title: ACE-Step 1.5 XL Music Generation (CPU) emoji: 🎵 colorFrom: indigo colorTo: yellow sdk: docker pinned: false license: mit tags: - music-generation - ace-step - gguf - lora - training - cpu - mcp-server short_description: ACE-Step 1.5 XL - CPU music generation + LoRA training models: - ACE-Step/Ace-Step1.5 startup_duration_timeout: 2h --- # ACE-Step 1.5 XL Music Generation (CPU) **GGUF inference + LoRA training** on free CPU Spaces. Powered by [acestep.cpp](https://github.com/ServeurpersoCom/acestep.cpp). ## Features - **Music Generation** -- text/lyrics to stereo 48kHz MP3 via GGUF quantized models - **LoRA Training** -- fine-tune on your own audio (~11s/epoch CPU, ~1.4s/epoch GPU) - **Auto-Captioning** -- librosa BPM/key/signature + LM understand mode (caption + lyrics extraction) - **Multiple LM Sizes** -- 0.6B / 1.7B / 4B language models (on-demand download) - **Cancel + Download** -- cancel training mid-epoch, download trained LoRA adapter ## Music Generation 1. Enter a music description 2. Enter lyrics or check **Instrumental** 3. Adjust BPM, duration, steps, seed 4. Select LoRA adapter if trained 5. Click **Generate Music** **Timing:** ~270s for 10s audio with 1.7B LM, 8 steps on CPU. ## LoRA Training 1. Upload audio files (any length, auto-tiled at 30s chunks by VAE) 2. Set LoRA name, epochs, learning rate, rank 3. Click **Train** -- ace-server stops during training, restarts after 4. Use **Cancel** to stop early (saves checkpoint) 5. **Download** the trained adapter file 6. Trained adapter appears in the LoRA dropdown **Timing:** ~170s preprocessing + ~11s/epoch on CPU. GPU: ~1.4s/epoch. **Limits:** 30 min total audio across all files. Files exceeding the cap are truncated with a warning. 50 files max. 8h training timeout. **Settings (per Side-Step author recommendations):** - LR: 3e-4 - Rank: 32, Alpha: 64 - Epochs: 200-500 for 3-10 files - Optimizer: Adafactor (minimal memory) - Variant: standard turbo (not XL -- XL swaps on 18GB) ## Captioning Pipeline Training audio is auto-captioned before preprocessing: | Method | What it extracts | Speed | |--------|-----------------|-------| | **librosa** | BPM, key, time signature | ~3s/file | | **LM understand** (GPU) | Rich caption + lyrics + metadata | ~52s/file | | **ace-server /understand** (Space) | Same as LM, via GGUF | ~30s/file | | **.txt/.json sidecar** | User-provided caption (if present) | instant | On Space: uses ace-server /understand before training. Locally: uses PyTorch LM understand. ## Models | Component | GGUF | Size | Purpose | |-----------|------|------|---------| | DiT XL turbo | acestep-v15-xl-turbo-Q4_K_M | 2.8 GB | Music generation (no LoRA) | | DiT standard turbo | acestep-v15-turbo-Q4_K_M | 1.1 GB | Music generation (with LoRA) | | LM 1.7B | acestep-5Hz-lm-1.7B-Q8_0 | 1.7 GB | Caption understanding | | Text Encoder | Qwen3-Embedding-0.6B-Q8_0 | 0.75 GB | Text encoding | | VAE | vae-BF16 | 0.32 GB | Audio encode/decode | ## API ### Generate Music ```python from gradio_client import Client client = Client("WeReCooking/ACE-Step-CPU") result = client.predict( caption="upbeat electronic dance music", lyrics="[Instrumental]", instrumental=True, bpm=120, duration=10, seed=-1, steps=8, lora_select="None (no LoRA)", lm_model_select="acestep-5Hz-lm-1.7B-Q8_0.gguf", api_name="/generate" ) ``` ### Train LoRA ```python from gradio_client import Client, handle_file client = Client("WeReCooking/ACE-Step-CPU") result = client.predict( audio_files=[handle_file("song.mp3")], lora_name="my-style", epochs=200, lr=0.0003, rank=32, api_name="/train_lora" ) ``` ### MCP (Model Context Protocol) ```json { "mcpServers": { "ace-step": {"url": "https://werecooking-ace-step-cpu.hf.space/gradio_api/mcp/"} } } ``` ## CLI ```bash python app.py "upbeat electronic dance music" --duration 10 --steps 8 python app.py "jazz piano" --adapter my-style --seed 42 ``` ## Architecture - **Inference:** GGUF via [acestep.cpp](https://github.com/ServeurpersoCom/acestep.cpp) - **Training:** PyTorch, ported from [Side-Step](https://github.com/koda-dernet/Side-Step) (commit ecd13bd) - **Captioning:** librosa + LM understand (PyTorch or ace-server /understand) - Training stops ace-server to free RAM, restarts after with new adapters - Inference blocked during training with clear message ## Credits - [ACE-Step 1.5](https://github.com/ace-step/ACE-Step-1.5) - [acestep.cpp](https://github.com/ServeurpersoCom/acestep.cpp) - [Side-Step](https://github.com/koda-dernet/Side-Step) - [Serveurperso/ACE-Step-1.5-GGUF](https://huggingface.co/Serveurperso/ACE-Step-1.5-GGUF)