Instructions to use Open4bits/LFM2.5-1.2B-Base-Quantized with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Open4bits/LFM2.5-1.2B-Base-Quantized with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Open4bits/LFM2.5-1.2B-Base-Quantized")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Open4bits/LFM2.5-1.2B-Base-Quantized", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Open4bits/LFM2.5-1.2B-Base-Quantized with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Open4bits/LFM2.5-1.2B-Base-Quantized" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open4bits/LFM2.5-1.2B-Base-Quantized", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Open4bits/LFM2.5-1.2B-Base-Quantized
- SGLang
How to use Open4bits/LFM2.5-1.2B-Base-Quantized with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Open4bits/LFM2.5-1.2B-Base-Quantized" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open4bits/LFM2.5-1.2B-Base-Quantized", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Open4bits/LFM2.5-1.2B-Base-Quantized" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open4bits/LFM2.5-1.2B-Base-Quantized", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Open4bits/LFM2.5-1.2B-Base-Quantized with Docker Model Runner:
docker model run hf.co/Open4bits/LFM2.5-1.2B-Base-Quantized
Open4bits / LFM2.5-1.2B-Base-Quantized
This repository provides multiple quantized variants of the LFM 2.5 Base (1.2B parameters) model for efficient inference and deployment.
The original model is developed and released by LiquidAI:
Original model:
https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base
These quantizations are maintained and published by ArkAiLab under the Open4bits organization to improve accessibility across a wide range of hardware.
Available Quantization Formats
Each format is stored in a separate directory:
- FP16 โ Baseline half-precision weights
- FP8 โ High-performance low-precision format (GPU support required)
- INT8 โ Balanced performance and memory usage (BitsAndBytes)
- NF4 (4-bit) โ Maximum compression using BitsAndBytes double quant
Model Information
- Model Name: LFM 2.5 Base
- Parameters: ~1.2B
- Architecture: Custom LiquidAI architecture
- Original Author: LiquidAI
- Quantized By: ArkAiLab (Open4bits)
This model requires trust_remote_code=True when loading.
Quantization Details
- Quantized using PyTorch and Hugging Face Transformers
- INT8 and NF4 formats use BitsAndBytes
- FP8 provided where hardware support allows
- No GPTQ, AWQ, or llama.cpp used
- Safe for Google Colab and Kaggle
Usage Example
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Open4bits/LFM2.5-1.2B-Base-Quantized"
tokenizer = AutoTokenizer.from_pretrained(
model_id,
trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
device_map="auto"
)
inputs = tokenizer("Hello, world!", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Organization
This repository is maintained by ArkAiLab under the Open4bits initiative.
ArkAiLab (Main Organization): https://huggingface.co/ArkAiLab-Adl
Open4bits (Quantization Projects): https://huggingface.co/Open4bits
License
This repository follows the same license as the original LiquidAI model.
Please refer to the original model repository for full licensing details.
Disclaimer
This is an unofficial quantized release.
All credit for the original model architecture and training goes to LiquidAI.
Model tree for Open4bits/LFM2.5-1.2B-Base-Quantized
Base model
LiquidAI/LFM2.5-1.2B-Base