OLMoE-1B-7B DPO with DoRA (Merged)

This is the merged version of demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo - a preference-aligned OLMoE model trained with DoRA and DPO.

What's This?

A fully merged model ready for production deployment. The DoRA adapter has been merged into the base OLMoE-1B-7B weights for:

  • ✅ Faster inference (no adapter overhead)
  • ✅ vLLM compatibility
  • ✅ Simpler deployment
  • ✅ Production-ready

Training pipeline:

  1. SFT on 20K examples from OpenHermes-2.5
  2. DPO on 10K preference pairs from UltraFeedback
  3. Merged DoRA adapter into base weights

Quick Start

vLLM (Recommended)

# Serve
vllm serve demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged \
  --max-model-len 4096 \
  --dtype bfloat16

# Inference
curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged",
    "messages": [
      {"role": "user", "content": "Explain machine learning in simple terms."}
    ],
    "max_tokens": 200,
    "temperature": 0.7
  }' | jq -r '.choices[0].message.content'

Python with Transformers

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged")
model = AutoModelForCausalLM.from_pretrained(
    "demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged",
    device_map="auto",
    torch_dtype=torch.bfloat16
)

messages = [{"role": "user", "content": "What is quantum computing?"}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.inference_mode():
    outputs = model.generate(**inputs, max_tokens=200, temperature=0.7)
    
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Python with OpenAI Client

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="dummy"
)

response = client.chat.completions.create(
    model="demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged",
    messages=[
        {"role": "user", "content": "Write a Python function to calculate fibonacci numbers."}
    ],
    max_tokens=300,
    temperature=0.7
)

print(response.choices[0].message.content)

Model Details

Parameter Value
Architecture OLMoE (Mixture of Experts)
Parameters ~1B active, 7B total
Precision bfloat16
Context Length 4096 tokens
Training SFT + DPO with DoRA adapters
Base Model 1024m/OLMoE-1B-7B-0924-Base

Training Details

  • Adapter Type: DoRA (Weight-Decomposed LoRA)
  • LoRA Rank: 16
  • Target Modules: q_proj, v_proj
  • Quantization during training: 4-bit NF4
  • DPO Beta: 0.1
  • Learning Rate: 5e-5
  • Hardware: 2× NVIDIA A40 80GB

Chat Template

User:
<message>


Assistant:
<response>

Roles supported: system, user, assistant

Why Use the Merged Version?

  • Performance: No adapter overhead during inference
  • Compatibility: Works with vLLM, TGI, and other optimized serving frameworks
  • Simplicity: Single model file, no need to load base + adapter separately
  • Production-Ready: Optimized for deployment at scale

Adapter Version

Looking for the lightweight adapter weights? Check out demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo (~8.3MB)

License

Apache 2.0. Please also check the license of the base model.

Citation

If you use this model, please cite:

Downloads last month
16
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged

Finetuned
(1)
this model

Datasets used to train demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged