Intel AutoRound for best low-bit quantization

#5
by anjeysapkovski - opened

I would like to thank you very much for the release. Could you use Intel AutoRound tool with --enable_alg_ext to produce best q1/q2 quants with lowest rounding errors?

The official Intel team usually publishes mixed quantization of such models, ex q2ks:

Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits.

Such quantizations preserve high accuracy and low size.

https://github.com/intel/auto-round

Thank You for the message @anjeysapkovski .

Something special is coming...

We're currently training a new class of 'REAPER-PRISM' models—lossless 2-bit and 4-bit quantizations using our own SigRoundV2 REAP distillation fine-tuning process.
All of My Verified Supporters get priority tokenized access to new model drops ( as well as model and quant requests for upcoming releases).

Follow along on X (https://x.com/eelbaz) and Hugging Face for major announcements.

To get on the confirmed supporter/subscription list sign up and subscribe here -> ko-fi.com/ericelbaz
To show us some love and help expedite our work -> BTC: bc1qkhh8k7t4v48g6sr0nxxjpevktkea8vmez97qas

Sign up or log in to comment