Instructions to use Open4bits/LFM2.5-1.2B-Base-Quantized with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Open4bits/LFM2.5-1.2B-Base-Quantized with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Open4bits/LFM2.5-1.2B-Base-Quantized")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Open4bits/LFM2.5-1.2B-Base-Quantized", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Open4bits/LFM2.5-1.2B-Base-Quantized with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Open4bits/LFM2.5-1.2B-Base-Quantized"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Open4bits/LFM2.5-1.2B-Base-Quantized",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Open4bits/LFM2.5-1.2B-Base-Quantized

SGLang

How to use Open4bits/LFM2.5-1.2B-Base-Quantized with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Open4bits/LFM2.5-1.2B-Base-Quantized" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Open4bits/LFM2.5-1.2B-Base-Quantized",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Open4bits/LFM2.5-1.2B-Base-Quantized" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Open4bits/LFM2.5-1.2B-Base-Quantized",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Open4bits/LFM2.5-1.2B-Base-Quantized with Docker Model Runner:
```
docker model run hf.co/Open4bits/LFM2.5-1.2B-Base-Quantized
```

Open4bits / LFM2.5-1.2B-Base-Quantized

This repository provides multiple quantized variants of the LFM 2.5 Base (1.2B parameters) model for efficient inference and deployment.

The original model is developed and released by LiquidAI:

Original model:
https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base

These quantizations are maintained and published by ArkAiLab under the Open4bits organization to improve accessibility across a wide range of hardware.

Available Quantization Formats

Each format is stored in a separate directory:

FP16 – Baseline half-precision weights
FP8 – High-performance low-precision format (GPU support required)
INT8 – Balanced performance and memory usage (BitsAndBytes)
NF4 (4-bit) – Maximum compression using BitsAndBytes double quant

Model Information

Model Name: LFM 2.5 Base
Parameters: ~1.2B
Architecture: Custom LiquidAI architecture
Original Author: LiquidAI
Quantized By: ArkAiLab (Open4bits)

This model requires trust_remote_code=True when loading.

Quantization Details

Quantized using PyTorch and Hugging Face Transformers
INT8 and NF4 formats use BitsAndBytes
FP8 provided where hardware support allows
No GPTQ, AWQ, or llama.cpp used
Safe for Google Colab and Kaggle

Usage Example

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Open4bits/LFM2.5-1.2B-Base-Quantized"

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    device_map="auto"
)

inputs = tokenizer("Hello, world!", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Organization

This repository is maintained by ArkAiLab under the Open4bits initiative.

ArkAiLab (Main Organization): https://huggingface.co/ArkAiLab-Adl

Open4bits (Quantization Projects): https://huggingface.co/Open4bits

License

This repository follows the same license as the original LiquidAI model.

Please refer to the original model repository for full licensing details.

Disclaimer

This is an unofficial quantized release.

All credit for the original model architecture and training goes to LiquidAI.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Open4bits/LFM2.5-1.2B-Base-Quantized

Base model

LiquidAI/LFM2.5-1.2B-Base

Finetuned

(31)

this model