mlabonne's picture
Update README.md
5a54bea verified
metadata
license: other
license_name: lfm1.0
license_link: LICENSE
tags:
  - audio
  - liquid
  - lfm2
  - edge
  - llama.cpp
  - gguf
base_model:
  - LiquidAI/LFM2-Audio-1.5B
Liquid AI
Try LFM β€’ Documentation β€’ LEAP

LFM2-Audio-1.5B-GGUF

This example demonstrates the LFM2-Audio-1.5B audio model.

Link to HF: LiquidAI/LFM2-Audio-1.5B.

The model supports following modes

  • ASR:
    • input audio.wav, output text
  • TTS:
    • input text, output audio.wav
  • interleaved:
    • input text or audio.wav, output text and audio.wav

GGUFS

There are total 3 GGUFs for this model.

Set $CKPT to path to the path containing downloaded GGUFs. Set $INPUT_WAV to path to input wav file.

export CKPT=/data/playground/checkpoints/LFM2-Audio-1.5B-GGUF
export INPUT_WAV=/tmp/input.wav
export OUTPUT_WAV=/tmp/output.wav
(cd $CKPT && ls *.gguf)
audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf  LFM2-Audio-1.5B-Q8_0.gguf  mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf

Optionally, float16 ggufs can be downloaded and used by replacing Q8_0 with F16.

Binaries

runners folder contain runners for andoird-arm64, macos-arm64, ubuntu-arm64, and ubuntu-x64.

runners
β”œβ”€β”€ android-arm64
β”‚   └── lfm2-audio-android-arm64.zip
β”œβ”€β”€ macos-arm64
β”‚   └── lfm2-audio-macos-arm64.zip
β”œβ”€β”€ ubuntu-arm64
β”‚   └── lfm2-audio-ubuntu-arm64.zip
└── ubuntu-x64
    └── lfm2-audio-ubuntu-x64.zip

Each package contains llama-lfm2-audio and llama-mtmd-cli binaries.

Run using llama-lfm2-audio

There are 3 supported modes

  • ASR
  • TTS
  • interleaved

The mode is defined by system prompt. There are limitations on system prompt and binary will check for them and raise an error if needed.

ASR

ASR requires -sys "Perform ASR." and --audio audio.wav for input. It will print text to console

lfm2-audio-<platform>/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform ASR." --audio $INPUT_WAV

TTS

TTS requires -sys "Perform TTS.", -p "What is this obsession people have with books?" for input, and --output output.wav for output. It will save audio to output.wav.

lfm2-audio-<platform>/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform TTS." -p "What is this obsession people have with books?" --output $OUTPUT_WAV

Interleaved

Interleaved produces both, text and audio as output, and can consume text or audio as input.

lfm2-audio-<platform>/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Respond with interleaved text and audio." --audio $INPUT_WAV --output $OUTPUT_WAV

Run ASR using llama-mtmd-cli

Build llama-mtmd-cli following the standard build procedure.

lfm2-audio-<platform>/llama-mtmd-cli -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -p "<__media__>" -sys "Perform ASR." --audio $INPUT_WAV

Debug

For reproducible results set --temp 0.