In short, the Q3KS/Q2KS model meets the quality of Q4.
This model is a mixed gguf q3ks/q2ks format of microsoft/NextCoder-32B generated by intel/auto-round algorithm. Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits. Please refer to Section Generate the model for more details.
Please follow the license of the original model.
auto_round_mllm.exe --model "..\nextcoder32b" --output "PATH_TO_OUTPUT_DIR" --disable_opt_rtn --batch_size 4 --low_gpu_mem_usage --format gguf:q3_k_s --iters 0 --nsample 512 --seqlen 2048 --model_dtype fp16 --scale_dtype fp16
AutoRound is an advanced quantization library designed for Large Language Models (LLMs) and Vision-Language Models (VLMs). It delivers high accuracy at ultra-low bit widths (2โ4 bits) with minimal tuning by leveraging sign-gradient descent and offering broad hardware compatibility.
Delivers strong performance even at 2โ3 bits example models, with leading results at 4 bits benchmark.
https://github.com/intel/auto-round/blob/main/docs/tuning_norm_bias.md
- Downloads last month
- 17
2-bit
3-bit
Model tree for kalle07/NextCoder-32B-GGUF-auto-round
Base model
Qwen/Qwen2.5-32B