honkhazard-3
40.6M (10.49M embed, 16L/8H) | 975M seen

a third experiment to train only on synthetic messages!

  • parameters: 40.6M (13.11 mlp, 10.49 embed, 10.49 head, 13.11 mlp, 6.55 attn)
  • tokens seen: 975.2M
  • num_layers: 16
  • num_heads: 8
  • vocab_size: 32768

changes vs honkhazard-2:

  • identical main NN config
  • tweaked LRs
  • tuned batch count to ~halve training time
  • fixed bug causing dataset to be limited to ~600M causing repeated dataset
  • changed vocab size 64K -> 32K

trained on 1x rtx 5090 in 68.1m:

image


pre-training

pre-trained only on SYNTH messages in the following format:

<|bos|><|user_start|>{{query}}<|user_end|><|assistant_start|><|reasoning_start|>{{synthetic_reasoning}}<|reasoning_end|>{{synthetic_answer}}<|assistant_end|>

post-training

no post-training of any form has been performed on this model

postmortem

being honest: this model was not intended to be fully trained but sunk cost fallacy + curiousity made it so. loss is definitely better and was ~2x faster but seems less useful/same?

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train CanadaHonk/honkhazard-3