LFM2-Audio-1.5B-GGUF / README.md

mlabonne

Update README.md

5a54bea verified 5 days ago

preview code

raw

history blame contribute delete

4.02 kB

metadata

license: other
license_name: lfm1.0
license_link: LICENSE
tags:
  - audio
  - liquid
  - lfm2
  - edge
  - llama.cpp
  - gguf
base_model:
  - LiquidAI/LFM2-Audio-1.5B

Try LFM • Documentation • LEAP

LFM2-Audio-1.5B-GGUF

This example demonstrates the LFM2-Audio-1.5B audio model.

Link to HF: LiquidAI/LFM2-Audio-1.5B.

The model supports following modes

ASR:
- input audio.wav, output text
TTS:
- input text, output audio.wav
interleaved:
- input text or audio.wav, output text and audio.wav

GGUFS

There are total 3 GGUFs for this model.

Set $CKPT to path to the path containing downloaded GGUFs. Set $INPUT_WAV to path to input wav file.

export CKPT=/data/playground/checkpoints/LFM2-Audio-1.5B-GGUF
export INPUT_WAV=/tmp/input.wav
export OUTPUT_WAV=/tmp/output.wav

(cd $CKPT && ls *.gguf)
audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf  LFM2-Audio-1.5B-Q8_0.gguf  mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf

Optionally, float16 ggufs can be downloaded and used by replacing Q8_0 with F16.

Binaries

runners folder contain runners for andoird-arm64, macos-arm64, ubuntu-arm64, and ubuntu-x64.

runners
├── android-arm64
│   └── lfm2-audio-android-arm64.zip
├── macos-arm64
│   └── lfm2-audio-macos-arm64.zip
├── ubuntu-arm64
│   └── lfm2-audio-ubuntu-arm64.zip
└── ubuntu-x64
    └── lfm2-audio-ubuntu-x64.zip

Each package contains llama-lfm2-audio and llama-mtmd-cli binaries.

Run using `llama-lfm2-audio`

There are 3 supported modes

ASR
TTS
interleaved

The mode is defined by system prompt. There are limitations on system prompt and binary will check for them and raise an error if needed.

ASR

ASR requires -sys "Perform ASR." and --audio audio.wav for input. It will print text to console

lfm2-audio-<platform>/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform ASR." --audio $INPUT_WAV

TTS

TTS requires -sys "Perform TTS.", -p "What is this obsession people have with books?" for input, and --output output.wav for output. It will save audio to output.wav.

lfm2-audio-<platform>/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform TTS." -p "What is this obsession people have with books?" --output $OUTPUT_WAV

Interleaved

Interleaved produces both, text and audio as output, and can consume text or audio as input.

lfm2-audio-<platform>/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Respond with interleaved text and audio." --audio $INPUT_WAV --output $OUTPUT_WAV

Run ASR using `llama-mtmd-cli`

Build llama-mtmd-cli following the standard build procedure.

lfm2-audio-<platform>/llama-mtmd-cli -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -p "<__media__>" -sys "Perform ASR." --audio $INPUT_WAV

Debug

For reproducible results set --temp 0.