Spaces:

WeReCooking
/

ACE-Step-CPU

Running

App Files Files Community

ACE-Step-CPU / README.md

Nekochu

update README with final state, full pipeline inference, LM generation step

a5741b1 14 days ago

preview code

raw

history blame contribute delete

4.68 kB

metadata

title: ACE-Step 1.5 XL Music Generation (CPU)
emoji: 🎵
colorFrom: indigo
colorTo: yellow
sdk: docker
pinned: false
license: mit
tags:
  - music-generation
  - ace-step
  - gguf
  - lora
  - training
  - cpu
  - mcp-server
short_description: ACE-Step 1.5 XL - CPU music generation + LoRA training
models:
  - ACE-Step/Ace-Step1.5
startup_duration_timeout: 2h

ACE-Step 1.5 XL Music Generation (CPU)

GGUF inference + LoRA training on free CPU Spaces. Powered by acestep.cpp.

Features

Music Generation -- text/lyrics to stereo 48kHz MP3 via GGUF quantized models
LoRA Training -- fine-tune on your own audio (~11s/epoch CPU, ~1.4s/epoch GPU)
Auto-Captioning -- librosa BPM/key/signature + LM understand mode (caption + lyrics extraction)
Multiple LM Sizes -- 0.6B / 1.7B / 4B language models (on-demand download)
Cancel + Download -- cancel training mid-epoch, download trained LoRA adapter

Music Generation

Enter a music description
Enter lyrics or check Instrumental
Adjust BPM, duration, steps, seed
Select LoRA adapter if trained
Click Generate Music

Timing: ~270s for 10s audio with 1.7B LM, 8 steps on CPU.

LoRA Training

Upload audio files (any length, auto-tiled at 30s chunks by VAE)
Set LoRA name, epochs, learning rate, rank
Click Train -- ace-server stops during training, restarts after
Use Cancel to stop early (saves checkpoint)
Download the trained adapter file
Trained adapter appears in the LoRA dropdown

Timing: ~170s preprocessing + ~11s/epoch on CPU. GPU: ~1.4s/epoch.

Limits: 30 min total audio across all files. Files exceeding the cap are truncated with a warning. 50 files max. 8h training timeout.

Settings (per Side-Step author recommendations):

LR: 3e-4
Rank: 32, Alpha: 64
Epochs: 200-500 for 3-10 files
Optimizer: Adafactor (minimal memory)
Variant: standard turbo (not XL -- XL swaps on 18GB)

Captioning Pipeline

Training audio is auto-captioned before preprocessing:

Method	What it extracts	Speed
librosa	BPM, key, time signature	~3s/file
LM understand (GPU)	Rich caption + lyrics + metadata	~52s/file
ace-server /understand (Space)	Same as LM, via GGUF	~30s/file
.txt/.json sidecar	User-provided caption (if present)	instant

On Space: uses ace-server /understand before training. Locally: uses PyTorch LM understand.

Models

Component	GGUF	Size	Purpose
DiT XL turbo	acestep-v15-xl-turbo-Q4_K_M	2.8 GB	Music generation (no LoRA)
DiT standard turbo	acestep-v15-turbo-Q4_K_M	1.1 GB	Music generation (with LoRA)
LM 1.7B	acestep-5Hz-lm-1.7B-Q8_0	1.7 GB	Caption understanding
Text Encoder	Qwen3-Embedding-0.6B-Q8_0	0.75 GB	Text encoding
VAE	vae-BF16	0.32 GB	Audio encode/decode

API

Generate Music

from gradio_client import Client

client = Client("WeReCooking/ACE-Step-CPU")
result = client.predict(
    caption="upbeat electronic dance music",
    lyrics="[Instrumental]",
    instrumental=True, bpm=120, duration=10, seed=-1, steps=8,
    lora_select="None (no LoRA)",
    lm_model_select="acestep-5Hz-lm-1.7B-Q8_0.gguf",
    api_name="/generate"
)

Train LoRA

from gradio_client import Client, handle_file

client = Client("WeReCooking/ACE-Step-CPU")
result = client.predict(
    audio_files=[handle_file("song.mp3")],
    lora_name="my-style", epochs=200, lr=0.0003, rank=32,
    api_name="/train_lora"
)

MCP (Model Context Protocol)

{
  "mcpServers": {
    "ace-step": {"url": "https://werecooking-ace-step-cpu.hf.space/gradio_api/mcp/"}
  }
}

CLI

python app.py "upbeat electronic dance music" --duration 10 --steps 8
python app.py "jazz piano" --adapter my-style --seed 42

Architecture

Inference: GGUF via acestep.cpp
Training: PyTorch, ported from Side-Step (commit ecd13bd)
Captioning: librosa + LM understand (PyTorch or ace-server /understand)
Training stops ace-server to free RAM, restarts after with new adapters
Inference blocked during training with clear message