In short, the Q3KS/Q2KS model meets the quality of Q4.

This model is a mixed gguf q3ks/q2ks format of microsoft/NextCoder-32B generated by intel/auto-round algorithm. Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits. Please refer to Section Generate the model for more details.

Please follow the license of the original model.

auto_round_mllm.exe --model "..\nextcoder32b" --output "PATH_TO_OUTPUT_DIR" --disable_opt_rtn --batch_size 4 --low_gpu_mem_usage --format gguf:q3_k_s --iters 0 --nsample 512 --seqlen 2048 --model_dtype fp16 --scale_dtype fp16

AutoRound is an advanced quantization library designed for Large Language Models (LLMs) and Vision-Language Models (VLMs). It delivers high accuracy at ultra-low bit widths (2–4 bits) with minimal tuning by leveraging sign-gradient descent and offering broad hardware compatibility.
Delivers strong performance even at 2–3 bits example models, with leading results at 4 bits benchmark.

https://github.com/intel/auto-round/blob/main/docs/tuning_norm_bias.md