--- license: other license_name: lfm1.0 license_link: LICENSE tags: - audio - liquid - lfm2 - edge - llama.cpp - gguf base_model: - LiquidAI/LFM2-Audio-1.5B ---
Liquid AI
Try LFMDocumentationLEAP
# LFM2-Audio-1.5B-GGUF This example demonstrates the **LFM2-Audio-1.5B** audio model. Link to HF: [LiquidAI/LFM2-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B). The model supports following modes - ASR: - input `audio.wav`, output `text` - TTS: - input `text`, output `audio.wav` - interleaved: - input `text` or `audio.wav`, output `text` and `audio.wav` ## GGUFS There are total 3 GGUFs for this model. Set `$CKPT` to path to the path containing downloaded GGUFs. Set `$INPUT_WAV` to path to input wav file. ```console export CKPT=/data/playground/checkpoints/LFM2-Audio-1.5B-GGUF export INPUT_WAV=/tmp/input.wav export OUTPUT_WAV=/tmp/output.wav ``` ```console (cd $CKPT && ls *.gguf) audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf LFM2-Audio-1.5B-Q8_0.gguf mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf ``` Optionally, float16 ggufs can be downloaded and used by replacing `Q8_0` with `F16`. ## Binaries `runners` folder contain runners for andoird-arm64, macos-arm64, ubuntu-arm64, and ubuntu-x64. ```console runners ├── android-arm64 │ └── lfm2-audio-android-arm64.zip ├── macos-arm64 │ └── lfm2-audio-macos-arm64.zip ├── ubuntu-arm64 │ └── lfm2-audio-ubuntu-arm64.zip └── ubuntu-x64 └── lfm2-audio-ubuntu-x64.zip ``` Each package contains `llama-lfm2-audio` and `llama-mtmd-cli` binaries. ## Run using `llama-lfm2-audio` There are 3 supported modes - ASR - TTS - interleaved The mode is defined by system prompt. There are limitations on system prompt and binary will check for them and raise an error if needed. ### ASR ASR requires `-sys "Perform ASR."` and `--audio audio.wav` for input. It will print text to console ```console lfm2-audio-/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform ASR." --audio $INPUT_WAV ``` ### TTS TTS requires `-sys "Perform TTS."`, `-p "What is this obsession people have with books?"` for input, and `--output output.wav` for output. It will save audio to `output.wav`. ```console lfm2-audio-/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform TTS." -p "What is this obsession people have with books?" --output $OUTPUT_WAV ``` ### Interleaved Interleaved produces both, text and audio as output, and can consume text or audio as input. ```console lfm2-audio-/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Respond with interleaved text and audio." --audio $INPUT_WAV --output $OUTPUT_WAV ``` ## Run ASR using `llama-mtmd-cli` Build `llama-mtmd-cli` following the standard build procedure. ```console lfm2-audio-/llama-mtmd-cli -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -p "<__media__>" -sys "Perform ASR." --audio $INPUT_WAV ``` ### Debug For reproducible results set `--temp 0`.