GLM-4.5-Air-Derestricted-GGUF

This repository contains several custom GGUF quantizations of ArliAI/GLM-4.5-Air-Derestricted, to be used with llama.cpp.

The naming scheme for these custom quantizations is as follows:

ModelName-DefaultType-FFN-UpType-GateType-DownType.gguf

Where DefaultType refers to the default tensor type, and UpType, GateType, and DownType refer to the tensor types used for the ffn_up_exps, ffn_gate_exps, and ffn_down_exps tensors respectively.

Quantizations

These quantizations use Q8_0 for all tensors by default, including the dense FFN block. Only the conditional experts are downgraded. The shared expert is always kept in Q8_0. They were quantized using my own imatrix (the calibration text corpus can be found here).

Filename Size (GB) Size (GiB) Average BPW Direct link
GLM-4.5-Air-Derestricted-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0.gguf 68.63 63.92 4.97 Download
GLM-4.5-Air-Derestricted-Q8_0-FFN-Q5_K-Q5_K-Q8_0.gguf 91.97 85.66 6.66 Download
GLM-4.5-Air-Derestricted-Q8_0-FFN-Q6_K-Q6_K-Q8_0.gguf 100.99 94.06 7.31 Download
GLM-4.5-Air-Derestricted-Q8_0.gguf 117.45 109.38 8.51 Download
GLM-4.5-Air-Derestricted-bf16.gguf 220.98 205.81 16.00 Download 1/2 Download 2/2
Downloads last month
667
GGUF
Model size
110B params
Architecture
glm4moe
Hardware compatibility
Log In to view the estimation

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ddh0/GLM-4.5-Air-Derestricted-GGUF

Quantized
(21)
this model