Intel AutoRound for best low-bit quantization
I would like to thank you very much for the release. Could you use Intel AutoRound tool with --enable_alg_ext to produce best q1/q2 quants with lowest rounding errors?
The official Intel team usually publishes mixed quantization of such models, ex q2ks:
Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits.
Such quantizations preserve high accuracy and low size.
Thank You for the message @anjeysapkovski .
Something special is coming...
We're currently training a new class of 'REAPER-PRISM' models—lossless 2-bit and 4-bit quantizations using our own SigRoundV2 REAP distillation fine-tuning process.
All of My Verified Supporters get priority tokenized access to new model drops ( as well as model and quant requests for upcoming releases).
Follow along on X (https://x.com/eelbaz) and Hugging Face for major announcements.
To get on the confirmed supporter/subscription list sign up and subscribe here -> ko-fi.com/ericelbaz
To show us some love and help expedite our work -> BTC: bc1qkhh8k7t4v48g6sr0nxxjpevktkea8vmez97qas