OLMoE-1B-7B DPO with DoRA (Merged)
This is the merged version of demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo - a preference-aligned OLMoE model trained with DoRA and DPO.
What's This?
A fully merged model ready for production deployment. The DoRA adapter has been merged into the base OLMoE-1B-7B weights for:
- ✅ Faster inference (no adapter overhead)
- ✅ vLLM compatibility
- ✅ Simpler deployment
- ✅ Production-ready
Training pipeline:
- SFT on 20K examples from OpenHermes-2.5
- DPO on 10K preference pairs from UltraFeedback
- Merged DoRA adapter into base weights
Quick Start
vLLM (Recommended)
# Serve
vllm serve demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged \
--max-model-len 4096 \
--dtype bfloat16
# Inference
curl -s http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged",
"messages": [
{"role": "user", "content": "Explain machine learning in simple terms."}
],
"max_tokens": 200,
"temperature": 0.7
}' | jq -r '.choices[0].message.content'
Python with Transformers
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged")
model = AutoModelForCausalLM.from_pretrained(
"demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged",
device_map="auto",
torch_dtype=torch.bfloat16
)
messages = [{"role": "user", "content": "What is quantum computing?"}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
outputs = model.generate(**inputs, max_tokens=200, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Python with OpenAI Client
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="dummy"
)
response = client.chat.completions.create(
model="demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged",
messages=[
{"role": "user", "content": "Write a Python function to calculate fibonacci numbers."}
],
max_tokens=300,
temperature=0.7
)
print(response.choices[0].message.content)
Model Details
| Parameter | Value |
|---|---|
| Architecture | OLMoE (Mixture of Experts) |
| Parameters | ~1B active, 7B total |
| Precision | bfloat16 |
| Context Length | 4096 tokens |
| Training | SFT + DPO with DoRA adapters |
| Base Model | 1024m/OLMoE-1B-7B-0924-Base |
Training Details
- Adapter Type: DoRA (Weight-Decomposed LoRA)
- LoRA Rank: 16
- Target Modules: q_proj, v_proj
- Quantization during training: 4-bit NF4
- DPO Beta: 0.1
- Learning Rate: 5e-5
- Hardware: 2× NVIDIA A40 80GB
Chat Template
User:
<message>
Assistant:
<response>
Roles supported: system, user, assistant
Why Use the Merged Version?
- Performance: No adapter overhead during inference
- Compatibility: Works with vLLM, TGI, and other optimized serving frameworks
- Simplicity: Single model file, no need to load base + adapter separately
- Production-Ready: Optimized for deployment at scale
Adapter Version
Looking for the lightweight adapter weights? Check out demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo (~8.3MB)
License
Apache 2.0. Please also check the license of the base model.
Citation
If you use this model, please cite:
- Base Model: 1024m/OLMoE-1B-7B-0924-Base
- OpenHermes-2.5: teknium/OpenHermes-2.5
- UltraFeedback: HuggingFaceH4/ultrafeedback_binarized
- TRL: HuggingFace TRL
- Downloads last month
- 16
Model tree for demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged
Base model
1024m/OLMoE-1B-7B-0924-Base