GLM-4.5-Air-Derestricted-GGUF
This repository contains several custom GGUF quantizations of ArliAI/GLM-4.5-Air-Derestricted, to be used with llama.cpp.
The naming scheme for these custom quantizations is as follows:
ModelName-DefaultType-FFN-UpType-GateType-DownType.gguf
Where DefaultType refers to the default tensor type, and UpType, GateType, and DownType refer to the tensor types used for the ffn_up_exps, ffn_gate_exps, and ffn_down_exps tensors respectively.
Quantizations
These quantizations use Q8_0 for all tensors by default, including the dense FFN block. Only the conditional experts are downgraded. The shared expert is always kept in Q8_0. They were quantized using my own imatrix (the calibration text corpus can be found here).
| Filename | Size (GB) | Size (GiB) | Average BPW | Direct link |
|---|---|---|---|---|
| GLM-4.5-Air-Derestricted-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0.gguf | 68.63 | 63.92 | 4.97 | Download |
| GLM-4.5-Air-Derestricted-Q8_0-FFN-Q5_K-Q5_K-Q8_0.gguf | 91.97 | 85.66 | 6.66 | Download |
| GLM-4.5-Air-Derestricted-Q8_0-FFN-Q6_K-Q6_K-Q8_0.gguf | 100.99 | 94.06 | 7.31 | Download |
| GLM-4.5-Air-Derestricted-Q8_0.gguf | 117.45 | 109.38 | 8.51 | Download |
| GLM-4.5-Air-Derestricted-bf16.gguf | 220.98 | 205.81 | 16.00 | Download 1/2 Download 2/2 |
- Downloads last month
- 667
Hardware compatibility
Log In
to view the estimation
5-bit
8-bit
16-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support