RichardErkhov commited on
Commit
2286bee
·
verified ·
1 Parent(s): 30e21e3

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +276 -0
README.md ADDED
@@ -0,0 +1,276 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ deepseek-coder-1.3b-typescript - GGUF
11
+ - Model creator: https://huggingface.co/CodeGPTPlus/
12
+ - Original model: https://huggingface.co/CodeGPTPlus/deepseek-coder-1.3b-typescript/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [deepseek-coder-1.3b-typescript.Q2_K.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q2_K.gguf) | Q2_K | 0.52GB |
18
+ | [deepseek-coder-1.3b-typescript.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.IQ3_XS.gguf) | IQ3_XS | 0.57GB |
19
+ | [deepseek-coder-1.3b-typescript.IQ3_S.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.IQ3_S.gguf) | IQ3_S | 0.6GB |
20
+ | [deepseek-coder-1.3b-typescript.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q3_K_S.gguf) | Q3_K_S | 0.6GB |
21
+ | [deepseek-coder-1.3b-typescript.IQ3_M.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.IQ3_M.gguf) | IQ3_M | 0.63GB |
22
+ | [deepseek-coder-1.3b-typescript.Q3_K.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q3_K.gguf) | Q3_K | 0.66GB |
23
+ | [deepseek-coder-1.3b-typescript.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q3_K_M.gguf) | Q3_K_M | 0.66GB |
24
+ | [deepseek-coder-1.3b-typescript.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q3_K_L.gguf) | Q3_K_L | 0.69GB |
25
+ | [deepseek-coder-1.3b-typescript.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.IQ4_XS.gguf) | IQ4_XS | 0.7GB |
26
+ | [deepseek-coder-1.3b-typescript.Q4_0.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q4_0.gguf) | Q4_0 | 0.72GB |
27
+ | [deepseek-coder-1.3b-typescript.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.IQ4_NL.gguf) | IQ4_NL | 0.73GB |
28
+ | [deepseek-coder-1.3b-typescript.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q4_K_S.gguf) | Q4_K_S | 0.76GB |
29
+ | [deepseek-coder-1.3b-typescript.Q4_K.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q4_K.gguf) | Q4_K | 0.81GB |
30
+ | [deepseek-coder-1.3b-typescript.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q4_K_M.gguf) | Q4_K_M | 0.81GB |
31
+ | [deepseek-coder-1.3b-typescript.Q4_1.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q4_1.gguf) | Q4_1 | 0.8GB |
32
+ | [deepseek-coder-1.3b-typescript.Q5_0.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q5_0.gguf) | Q5_0 | 0.87GB |
33
+ | [deepseek-coder-1.3b-typescript.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q5_K_S.gguf) | Q5_K_S | 0.89GB |
34
+ | [deepseek-coder-1.3b-typescript.Q5_K.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q5_K.gguf) | Q5_K | 0.93GB |
35
+ | [deepseek-coder-1.3b-typescript.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q5_K_M.gguf) | Q5_K_M | 0.93GB |
36
+ | [deepseek-coder-1.3b-typescript.Q5_1.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q5_1.gguf) | Q5_1 | 0.95GB |
37
+ | [deepseek-coder-1.3b-typescript.Q6_K.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q6_K.gguf) | Q6_K | 1.09GB |
38
+ | [deepseek-coder-1.3b-typescript.Q8_0.gguf](https://huggingface.co/RichardErkhov/CodeGPTPlus_-_deepseek-coder-1.3b-typescript-gguf/blob/main/deepseek-coder-1.3b-typescript.Q8_0.gguf) | Q8_0 | 1.33GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ license: other
46
+ base_model: deepseek-ai/deepseek-coder-1.3b-base
47
+ tags:
48
+ - axolotl
49
+ - generated_from_trainer
50
+ model-index:
51
+ - name: deepseek-coder-1.3b-typescript
52
+ results: []
53
+ datasets:
54
+ - bigcode/the-stack-dedup
55
+ widget:
56
+ - text: "class Person {\n constructor(public name:"
57
+ example_title: "class"
58
+ - text: "function quickSort"
59
+ example_title: "function"
60
+ ---
61
+
62
+ <p align="center">
63
+ <img width="1000px" alt="CodeGPT: DeepSeek Coder - Typescript" src="codegpt-deepseek-typescript.png?raw=true">
64
+ </p>
65
+ <p align="center"><a href="https://codegpt.co/">[CodeGPT.co]</a> | <a href="https://ollama.ai/codegpt/deepseek-coder-1.3b-typescript">[🦙 Ollama]</a> | <a href="https://discord.gg/fKyyJX5pne">[Discord]</a> | <a href="https://marketplace.visualstudio.com/items?itemName=DanielSanMedium.dscodegpt">[VSCode Extension]</a> </p>
66
+ <hr>
67
+
68
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
69
+ <details><summary>See axolotl config</summary>
70
+
71
+ axolotl version: `0.3.0`
72
+ ```yaml
73
+ base_model: deepseek-ai/deepseek-coder-1.3b-base
74
+ model_type: AutoModelForCausalLM
75
+ trust_remote_code: true
76
+ load_in_8bit: false
77
+ load_in_4bit: false
78
+ strict: false
79
+
80
+
81
+ datasets:
82
+ - path: CodeGPTPlus/typescript-0-500000-seq1024
83
+ type: completion
84
+ field: text
85
+
86
+
87
+ val_set_size: 0.001
88
+ output_dir: ./fft-out
89
+
90
+ sequence_len: 1024
91
+
92
+ adapter:
93
+ lora_model_dir:
94
+ lora_r:
95
+ lora_alpha:
96
+ lora_dropout:
97
+ lora_target_linear:
98
+ lora_fan_in_fan_out:
99
+ lora_modules_to_save:
100
+
101
+ wandb_project: deepseek_1.3_fft
102
+ wandb_entity:
103
+ wandb_watch:
104
+ wandb_name: aws_a10g
105
+ wandb_log_model: end
106
+
107
+
108
+ gradient_accumulation_steps: 2
109
+ micro_batch_size: 20
110
+ num_epochs: 1
111
+ optimizer: adamw_bnb_8bit
112
+ adam_beta1: 0.9
113
+ adam_beta2: 0.999
114
+ adam_epsilon: 0.000001
115
+ max_grad_norm: 1.0
116
+ weight_decay: 0.1
117
+ lr_scheduler: cosine
118
+ learning_rate: 0.00002
119
+ train_on_inputs: false
120
+ group_by_length: false
121
+ bf16: true
122
+ fp16: false
123
+ tf32: false
124
+ gradient_checkpointing: true
125
+ early_stopping_patience:
126
+ resume_from_checkpoint:
127
+ local_rank:
128
+ logging_steps: 1
129
+ xformers_attention:
130
+ flash_attention: true
131
+
132
+ loss_watchdog_threshold: 5.0
133
+ loss_watchdog_patience: 3
134
+
135
+ hub_model_id: CodeGPTPlus/deepseek_coder_1.3b_typescript
136
+ hub_strategy: every_save
137
+ warmup_ratio: 0.01
138
+ evals_per_epoch: 20
139
+ saves_per_epoch: 3
140
+ debug:
141
+ deepspeed:
142
+
143
+ fsdp:
144
+ fsdp_config:
145
+ special_tokens:
146
+ bos_token: "<|begin▁of▁sentence|>"
147
+ eos_token: "<|end▁of▁sentence|>"
148
+ pad_token: "<|end▁of▁sentence|>"
149
+ ```
150
+
151
+ </details><br>
152
+
153
+ # deepseek-coder-1.3b-typescript
154
+
155
+ CodeGPTPlus/deepseek-coder-1.3b-typescript, emerges as a fine-tuned iteration of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base), meticulously crafted by the CodeGPT team to excel in generating expert code in TypeScript. With specific fine-tuning for TypeScript and a dataset of 0.5B tokens, this model excels in producing precise and efficient solutions in this programming language.
156
+
157
+ The 16K window size and an additional fill-in-the-middle task are employed to deliver project-level code completion.
158
+
159
+ This new model stands as the ideal choice for those seeking a specialized code generator for TypeScript, backed by the expertise of the CodeGPT team.
160
+
161
+ It achieves the following results on the evaluation set:
162
+ - Loss: 0.7681
163
+
164
+ **Model Developers** CodeGPT Team
165
+
166
+ **Variations** 1.3B
167
+
168
+ **Input** Models input text only.
169
+
170
+ **Output** Models generate text only.
171
+
172
+ ## How to Use
173
+ This model is for completion purposes only. Here give some examples of how to use the model.
174
+
175
+ #### Running the model on a GPU
176
+ ```python
177
+ from transformers import AutoTokenizer, AutoModelForCausalLM
178
+ tokenizer = AutoTokenizer.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript",
179
+ trust_remote_code=True)
180
+ model = AutoModelForCausalLM.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript",
181
+ trust_remote_code=True).cuda()
182
+
183
+ input_text = """<|fim▁begin|>function quickSort(arr: number[]): number[] {
184
+ if (arr.length <= 1) {
185
+ return arr;
186
+ }
187
+ const pivot = arr[0];
188
+ const left = [];
189
+ const right = [];
190
+ <|fim▁hole|>
191
+ return [...quickSort(left), pivot, ...quickSort(right)];
192
+ }<|fim▁end|>"""
193
+
194
+ inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
195
+ outputs = model.generate(**inputs, max_length=256)
196
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
197
+ ```
198
+
199
+ ### Running with Ollama
200
+ **Model:** https://ollama.ai/codegpt/deepseek-coder-1.3b-typescript
201
+
202
+ ```ollama run codegpt/deepseek-coder-1.3b-typescript```
203
+
204
+ ### Running with Ollama and CodeGPT Autocomplete in VSCode
205
+
206
+ **Documentation:** https://docs.codegpt.co/docs/tutorial-features/code_autocompletion
207
+
208
+ Select "Ollama - codegpt/deepseek-coder-1.3b-typescript" in the autocomplete model selector.
209
+
210
+ Then, write any code or comment in the vscode text editor, and the model will provide you with code suggestions through the CodeGPT code autocomplete.
211
+
212
+ <img width="1000px" alt="CodeGPT: DeepSeek Coder - Typescript" src="ollama_autocomplete_codegpt.gif">
213
+
214
+ ### Fill In the Middle (FIM)
215
+ ```python
216
+ <|fim▁begin|>function quickSort(arr: number[]): number[] {
217
+ if (arr.length <= 1) {
218
+ return arr;
219
+ }
220
+ const pivot = arr[0];
221
+ const left = [];
222
+ const right = [];
223
+ <|fim▁hole|>
224
+ return [...quickSort(left), pivot, ...quickSort(right)];
225
+ }<|fim▁end|>
226
+ ```
227
+
228
+ ## Training procedure
229
+
230
+ ### Training hyperparameters
231
+
232
+ The following hyperparameters were used during training:
233
+ - learning_rate: 2e-05
234
+ - train_batch_size: 20
235
+ - eval_batch_size: 20
236
+ - seed: 42
237
+ - gradient_accumulation_steps: 2
238
+ - total_train_batch_size: 40
239
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
240
+ - lr_scheduler_type: cosine
241
+ - lr_scheduler_warmup_steps: 261
242
+ - num_epochs: 1
243
+
244
+ ### Training results
245
+
246
+ | Training Loss | Epoch | Step | Validation Loss |
247
+ |:-------------:|:-----:|:-----:|:---------------:|
248
+ | 1.0745 | 0.0 | 1 | 0.8681 |
249
+ | 1.2267 | 0.05 | 1308 | 0.8130 |
250
+ | 1.1594 | 0.1 | 2616 | 0.8018 |
251
+ | 0.7674 | 0.15 | 3924 | 0.7942 |
252
+ | 0.6443 | 0.2 | 5232 | 0.7889 |
253
+ | 0.9155 | 0.25 | 6540 | 0.7847 |
254
+ | 0.7501 | 0.3 | 7848 | 0.7819 |
255
+ | 0.8835 | 0.35 | 9156 | 0.7792 |
256
+ | 0.7261 | 0.4 | 10464 | 0.7769 |
257
+ | 0.9746 | 0.45 | 11772 | 0.7748 |
258
+ | 0.6884 | 0.5 | 13080 | 0.7734 |
259
+ | 0.6104 | 0.55 | 14388 | 0.7722 |
260
+ | 0.8876 | 0.6 | 15696 | 0.7710 |
261
+ | 0.9567 | 0.65 | 17004 | 0.7703 |
262
+ | 0.6915 | 0.7 | 18312 | 0.7696 |
263
+ | 0.8874 | 0.75 | 19620 | 0.7691 |
264
+ | 0.6124 | 0.8 | 20928 | 0.7686 |
265
+ | 0.8147 | 0.85 | 22236 | 0.7684 |
266
+ | 0.8021 | 0.9 | 23544 | 0.7683 |
267
+ | 0.8665 | 0.95 | 24852 | 0.7681 |
268
+
269
+
270
+ ### Framework versions
271
+
272
+ - Transformers 4.37.0.dev0
273
+ - Pytorch 2.0.1+cu118
274
+ - Datasets 2.16.1
275
+ - Tokenizers 0.15.0
276
+