In short, the Q3KS/Q2KS model meets the quality of Q4.

This model is a mixed gguf q3ks/q2ks format of microsoft/NextCoder-32B generated by intel/auto-round algorithm. Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits. Please refer to Section Generate the model for more details.

Please follow the license of the original model.

auto_round_mllm.exe --model "..\nextcoder32b" --output "PATH_TO_OUTPUT_DIR" --disable_opt_rtn --batch_size 4 --low_gpu_mem_usage --format gguf:q3_k_s --iters 0 --nsample 512 --seqlen 2048 --model_dtype fp16 --scale_dtype fp16

AutoRound is an advanced quantization library designed for Large Language Models (LLMs) and Vision-Language Models (VLMs). It delivers high accuracy at ultra-low bit widths (2โ€“4 bits) with minimal tuning by leveraging sign-gradient descent and offering broad hardware compatibility.
Delivers strong performance even at 2โ€“3 bits example models, with leading results at 4 bits benchmark.

https://github.com/intel/auto-round/blob/main/docs/tuning_norm_bias.md

Downloads last month
17
GGUF
Model size
33B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for kalle07/NextCoder-32B-GGUF-auto-round

Base model

Qwen/Qwen2.5-32B
Quantized
(11)
this model