honkhazard-3
40.6M (10.49M embed, 16L/8H) | 975M seen
a third experiment to train only on synthetic messages!
- parameters: 40.6M (13.11 mlp, 10.49 embed, 10.49 head, 13.11 mlp, 6.55 attn)
- tokens seen: 975.2M
- num_layers: 16
- num_heads: 8
- vocab_size: 32768
changes vs honkhazard-2:
- identical main NN config
- tweaked LRs
- tuned batch count to ~halve training time
- fixed bug causing dataset to be limited to ~600M causing repeated dataset
- changed vocab size 64K -> 32K
trained on 1x rtx 5090 in 68.1m:
pre-training
pre-trained only on SYNTH messages in the following format:
<|bos|><|user_start|>{{query}}<|user_end|><|assistant_start|><|reasoning_start|>{{synthetic_reasoning}}<|reasoning_end|>{{synthetic_answer}}<|assistant_end|>
post-training
no post-training of any form has been performed on this model
postmortem
being honest: this model was not intended to be fully trained but sunk cost fallacy + curiousity made it so. loss is definitely better and was ~2x faster but seems less useful/same?
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
