---
license: apache-2.0
datasets:
- PleIAs/SYNTH
language:
- en
---
honkhazard-3
40.6M (10.49M embed, 16L/8H) | 975M seen
---
a third experiment to train only on synthetic messages!
- parameters: 40.6M (13.11 mlp, 10.49 embed, 10.49 head, 6.55 attn)
- tokens seen: 975.2M
- num_layers: 16
- num_heads: 8
- vocab_size: 32768
changes vs *honkhazard-2*:
- identical main NN config
- tweaked LRs
- tuned batch count to ~halve training time
- fixed bug causing dataset to be limited to ~600M causing repeated dataset
- changed vocab size 64K -> 32K
trained on 1x rtx 5090 in 68.1m:

## pre-training
pre-trained only on SYNTH messages in the following format:
```
<|bos|><|user_start|>{{query}}<|user_end|><|assistant_start|><|reasoning_start|>{{synthetic_reasoning}}<|reasoning_end|>{{synthetic_answer}}<|assistant_end|>
```
## post-training
no post-training of any form has been performed on this model
## postmortem
being honest: this model was not intended to be fully trained but sunk cost fallacy + curiousity made it so. loss is definitely better and was ~2x faster but seems less useful/same?