demonlxrd commited on
Commit
f1670d5
·
verified ·
1 Parent(s): 4abcd51

Upload merged OLMoE-1B-7B with DoRA DPO

Browse files
README.md ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: 1024m/OLMoE-1B-7B-0924-Base
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ license: apache-2.0
6
+ tags:
7
+ - dpo
8
+ - dora
9
+ - qlora
10
+ - olmoe
11
+ - alignment
12
+ - preference-learning
13
+ - merged
14
+ datasets:
15
+ - teknium/OpenHermes-2.5
16
+ - HuggingFaceH4/ultrafeedback_binarized
17
+ language:
18
+ - en
19
+ ---
20
+
21
+ # OLMoE-1B-7B DPO with DoRA (Merged)
22
+
23
+ This is the **merged** version of [demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo](https://huggingface.co/demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo) - a preference-aligned OLMoE model trained with DoRA and DPO.
24
+
25
+ ## What's This?
26
+
27
+ A fully merged model ready for production deployment. The DoRA adapter has been merged into the base [OLMoE-1B-7B](https://huggingface.co/1024m/OLMoE-1B-7B-0924-Base) weights for:
28
+ - ✅ Faster inference (no adapter overhead)
29
+ - ✅ vLLM compatibility
30
+ - ✅ Simpler deployment
31
+ - ✅ Production-ready
32
+
33
+ Training pipeline:
34
+ 1. **SFT** on 20K examples from [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)
35
+ 2. **DPO** on 10K preference pairs from [UltraFeedback](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
36
+ 3. **Merged** DoRA adapter into base weights
37
+
38
+ ## Quick Start
39
+
40
+ ### vLLM (Recommended)
41
+
42
+ ```bash
43
+ # Serve
44
+ vllm serve demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged \
45
+ --max-model-len 4096 \
46
+ --dtype bfloat16
47
+
48
+ # Inference
49
+ curl -s http://localhost:8000/v1/chat/completions \
50
+ -H "Content-Type: application/json" \
51
+ -d '{
52
+ "model": "demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged",
53
+ "messages": [
54
+ {"role": "user", "content": "Explain machine learning in simple terms."}
55
+ ],
56
+ "max_tokens": 200,
57
+ "temperature": 0.7
58
+ }' | jq -r '.choices[0].message.content'
59
+ ```
60
+
61
+ ### Python with Transformers
62
+
63
+ ```python
64
+ import torch
65
+ from transformers import AutoTokenizer, AutoModelForCausalLM
66
+
67
+ tokenizer = AutoTokenizer.from_pretrained("demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged")
68
+ model = AutoModelForCausalLM.from_pretrained(
69
+ "demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged",
70
+ device_map="auto",
71
+ torch_dtype=torch.bfloat16
72
+ )
73
+
74
+ messages = [{"role": "user", "content": "What is quantum computing?"}]
75
+ prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
76
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
77
+
78
+ with torch.inference_mode():
79
+ outputs = model.generate(**inputs, max_tokens=200, temperature=0.7)
80
+
81
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
82
+ ```
83
+
84
+ ### Python with OpenAI Client
85
+
86
+ ```python
87
+ from openai import OpenAI
88
+
89
+ client = OpenAI(
90
+ base_url="http://localhost:8000/v1",
91
+ api_key="dummy"
92
+ )
93
+
94
+ response = client.chat.completions.create(
95
+ model="demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo-merged",
96
+ messages=[
97
+ {"role": "user", "content": "Write a Python function to calculate fibonacci numbers."}
98
+ ],
99
+ max_tokens=300,
100
+ temperature=0.7
101
+ )
102
+
103
+ print(response.choices[0].message.content)
104
+ ```
105
+
106
+ ## Model Details
107
+
108
+ | Parameter | Value |
109
+ |-----------|-------|
110
+ | Architecture | OLMoE (Mixture of Experts) |
111
+ | Parameters | ~1B active, 7B total |
112
+ | Precision | bfloat16 |
113
+ | Context Length | 4096 tokens |
114
+ | Training | SFT + DPO with DoRA adapters |
115
+ | Base Model | 1024m/OLMoE-1B-7B-0924-Base |
116
+
117
+ ## Training Details
118
+
119
+ - **Adapter Type**: DoRA (Weight-Decomposed LoRA)
120
+ - **LoRA Rank**: 16
121
+ - **Target Modules**: q_proj, v_proj
122
+ - **Quantization during training**: 4-bit NF4
123
+ - **DPO Beta**: 0.1
124
+ - **Learning Rate**: 5e-5
125
+ - **Hardware**: 2× NVIDIA A40 80GB
126
+
127
+ ## Chat Template
128
+
129
+ ```
130
+ User:
131
+ <message>
132
+
133
+
134
+ Assistant:
135
+ <response>
136
+ ```
137
+
138
+ Roles supported: `system`, `user`, `assistant`
139
+
140
+ ## Why Use the Merged Version?
141
+
142
+ - **Performance**: No adapter overhead during inference
143
+ - **Compatibility**: Works with vLLM, TGI, and other optimized serving frameworks
144
+ - **Simplicity**: Single model file, no need to load base + adapter separately
145
+ - **Production-Ready**: Optimized for deployment at scale
146
+
147
+ ## Adapter Version
148
+
149
+ Looking for the lightweight adapter weights? Check out [demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo](https://huggingface.co/demonlxrd/olmoe-openhermes-ultrafeedback-dora-dpo) (~8.3MB)
150
+
151
+ ## License
152
+
153
+ Apache 2.0. Please also check the license of the [base model](https://huggingface.co/1024m/OLMoE-1B-7B-0924-Base).
154
+
155
+ ## Citation
156
+
157
+ If you use this model, please cite:
158
+
159
+ - **Base Model**: [1024m/OLMoE-1B-7B-0924-Base](https://huggingface.co/1024m/OLMoE-1B-7B-0924-Base)
160
+ - **OpenHermes-2.5**: [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)
161
+ - **UltraFeedback**: [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
162
+ - **TRL**: [HuggingFace TRL](https://github.com/huggingface/trl)
chat_template.jinja ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ {% set sep = '\n\n' -%}
3
+ {% if bos_token is defined %}{{ bos_token }}{% endif -%}
4
+ {%- for m in messages -%}
5
+ {%- if m['role'] == 'system' -%}
6
+ System:
7
+ {{ m['content'] | trim }}{{ sep }}
8
+ {%- elif m['role'] == 'user' -%}
9
+ User:
10
+ {{ m['content'] | trim }}{{ sep }}
11
+ {%- elif m['role'] == 'assistant' -%}
12
+ Assistant:
13
+ {{ m['content'] | trim }}{{ sep }}
14
+ {%- elif m['role'] == 'tool' -%}
15
+ Tool:
16
+ {{ m['content'] | trim }}{{ sep }}
17
+ {%- endif -%}
18
+ {%- endfor -%}
19
+ {%- if add_generation_prompt -%}
20
+ Assistant:
21
+ {%- endif -%}
22
+
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "OlmoeForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "clip_qkv": null,
8
+ "dtype": "bfloat16",
9
+ "eos_token_id": 50279,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 2048,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1024,
14
+ "max_position_embeddings": 4096,
15
+ "model_type": "olmoe",
16
+ "norm_topk_prob": false,
17
+ "num_attention_heads": 16,
18
+ "num_experts": 64,
19
+ "num_experts_per_tok": 8,
20
+ "num_hidden_layers": 16,
21
+ "num_key_value_heads": 16,
22
+ "output_router_logits": false,
23
+ "pad_token_id": 1,
24
+ "rms_norm_eps": 1e-05,
25
+ "rope_scaling": null,
26
+ "rope_theta": 10000.0,
27
+ "router_aux_loss_coef": 0.01,
28
+ "tie_word_embeddings": false,
29
+ "transformers_version": "4.56.1",
30
+ "use_cache": true,
31
+ "vocab_size": 50304
32
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "eos_token_id": 50279,
4
+ "pad_token_id": 1,
5
+ "transformers_version": "4.56.1"
6
+ }
model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f28aeaae685d658c68276a67e724ea261568b0bcd02fe08d663997c900a8f31
3
+ size 4997744872
model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49ec291a1b529d63c642776e8ca870614ccf9cff23de6efb1f32d1f5a403038e
3
+ size 4997235176
model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f9c6c65aa1094abfd59b47854dfcdd8d00e6b25c6573189fb3d554ede876fa8
3
+ size 3843741912
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eos_token": {
3
+ "content": "<|endoftext|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "pad_token": {
10
+ "content": "<|padding|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ }
16
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": false,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "|||IP_ADDRESS|||",
8
+ "lstrip": false,
9
+ "normalized": true,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": false
13
+ },
14
+ "1": {
15
+ "content": "<|padding|>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "50254": {
23
+ "content": " ",
24
+ "lstrip": false,
25
+ "normalized": true,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": false
29
+ },
30
+ "50255": {
31
+ "content": " ",
32
+ "lstrip": false,
33
+ "normalized": true,
34
+ "rstrip": false,
35
+ "single_word": false,
36
+ "special": false
37
+ },
38
+ "50256": {
39
+ "content": " ",
40
+ "lstrip": false,
41
+ "normalized": true,
42
+ "rstrip": false,
43
+ "single_word": false,
44
+ "special": false
45
+ },
46
+ "50257": {
47
+ "content": " ",
48
+ "lstrip": false,
49
+ "normalized": true,
50
+ "rstrip": false,
51
+ "single_word": false,
52
+ "special": false
53
+ },
54
+ "50258": {
55
+ "content": " ",
56
+ "lstrip": false,
57
+ "normalized": true,
58
+ "rstrip": false,
59
+ "single_word": false,
60
+ "special": false
61
+ },
62
+ "50259": {
63
+ "content": " ",
64
+ "lstrip": false,
65
+ "normalized": true,
66
+ "rstrip": false,
67
+ "single_word": false,
68
+ "special": false
69
+ },
70
+ "50260": {
71
+ "content": " ",
72
+ "lstrip": false,
73
+ "normalized": true,
74
+ "rstrip": false,
75
+ "single_word": false,
76
+ "special": false
77
+ },
78
+ "50261": {
79
+ "content": " ",
80
+ "lstrip": false,
81
+ "normalized": true,
82
+ "rstrip": false,
83
+ "single_word": false,
84
+ "special": false
85
+ },
86
+ "50262": {
87
+ "content": " ",
88
+ "lstrip": false,
89
+ "normalized": true,
90
+ "rstrip": false,
91
+ "single_word": false,
92
+ "special": false
93
+ },
94
+ "50263": {
95
+ "content": " ",
96
+ "lstrip": false,
97
+ "normalized": true,
98
+ "rstrip": false,
99
+ "single_word": false,
100
+ "special": false
101
+ },
102
+ "50264": {
103
+ "content": " ",
104
+ "lstrip": false,
105
+ "normalized": true,
106
+ "rstrip": false,
107
+ "single_word": false,
108
+ "special": false
109
+ },
110
+ "50265": {
111
+ "content": " ",
112
+ "lstrip": false,
113
+ "normalized": true,
114
+ "rstrip": false,
115
+ "single_word": false,
116
+ "special": false
117
+ },
118
+ "50266": {
119
+ "content": " ",
120
+ "lstrip": false,
121
+ "normalized": true,
122
+ "rstrip": false,
123
+ "single_word": false,
124
+ "special": false
125
+ },
126
+ "50267": {
127
+ "content": " ",
128
+ "lstrip": false,
129
+ "normalized": true,
130
+ "rstrip": false,
131
+ "single_word": false,
132
+ "special": false
133
+ },
134
+ "50268": {
135
+ "content": " ",
136
+ "lstrip": false,
137
+ "normalized": true,
138
+ "rstrip": false,
139
+ "single_word": false,
140
+ "special": false
141
+ },
142
+ "50269": {
143
+ "content": " ",
144
+ "lstrip": false,
145
+ "normalized": true,
146
+ "rstrip": false,
147
+ "single_word": false,
148
+ "special": false
149
+ },
150
+ "50270": {
151
+ "content": " ",
152
+ "lstrip": false,
153
+ "normalized": true,
154
+ "rstrip": false,
155
+ "single_word": false,
156
+ "special": false
157
+ },
158
+ "50271": {
159
+ "content": " ",
160
+ "lstrip": false,
161
+ "normalized": true,
162
+ "rstrip": false,
163
+ "single_word": false,
164
+ "special": false
165
+ },
166
+ "50272": {
167
+ "content": " ",
168
+ "lstrip": false,
169
+ "normalized": true,
170
+ "rstrip": false,
171
+ "single_word": false,
172
+ "special": false
173
+ },
174
+ "50273": {
175
+ "content": " ",
176
+ "lstrip": false,
177
+ "normalized": true,
178
+ "rstrip": false,
179
+ "single_word": false,
180
+ "special": false
181
+ },
182
+ "50274": {
183
+ "content": " ",
184
+ "lstrip": false,
185
+ "normalized": true,
186
+ "rstrip": false,
187
+ "single_word": false,
188
+ "special": false
189
+ },
190
+ "50275": {
191
+ "content": " ",
192
+ "lstrip": false,
193
+ "normalized": true,
194
+ "rstrip": false,
195
+ "single_word": false,
196
+ "special": false
197
+ },
198
+ "50276": {
199
+ "content": " ",
200
+ "lstrip": false,
201
+ "normalized": true,
202
+ "rstrip": false,
203
+ "single_word": false,
204
+ "special": false
205
+ },
206
+ "50277": {
207
+ "content": "|||EMAIL_ADDRESS|||",
208
+ "lstrip": false,
209
+ "normalized": true,
210
+ "rstrip": false,
211
+ "single_word": false,
212
+ "special": false
213
+ },
214
+ "50278": {
215
+ "content": "|||PHONE_NUMBER|||",
216
+ "lstrip": false,
217
+ "normalized": true,
218
+ "rstrip": false,
219
+ "single_word": false,
220
+ "special": false
221
+ },
222
+ "50279": {
223
+ "content": "<|endoftext|>",
224
+ "lstrip": false,
225
+ "normalized": false,
226
+ "rstrip": false,
227
+ "single_word": false,
228
+ "special": true
229
+ }
230
+ },
231
+ "bos_token": null,
232
+ "clean_up_tokenization_spaces": true,
233
+ "eos_token": "<|endoftext|>",
234
+ "extra_special_tokens": {},
235
+ "model_max_length": 1000000000000000019884624838656,
236
+ "pad_token": "<|padding|>",
237
+ "tokenizer_class": "GPTNeoXTokenizer",
238
+ "unk_token": null
239
+ }