GLM-4.5-Air-Derestricted-GGUF

This repository contains several custom GGUF quantizations of ArliAI/GLM-4.5-Air-Derestricted, to be used with llama.cpp.

The naming scheme for these custom quantizations is as follows:

ModelName-DefaultType-FFN-UpType-GateType-DownType.gguf

Where DefaultType refers to the default tensor type, and UpType, GateType, and DownType refer to the tensor types used for the ffn_up_exps, ffn_gate_exps, and ffn_down_exps tensors respectively.

Quantizations

These quantizations use Q8_0 for all tensors by default, including the dense FFN block. Only the conditional experts are downgraded. The shared expert is always kept in Q8_0. They were quantized using my own imatrix (the calibration text corpus can be found here).

Filename	Size (GB)	Size (GiB)	Average BPW	Direct link
GLM-4.5-Air-Derestricted-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0.gguf	68.63	63.92	4.97	Download
GLM-4.5-Air-Derestricted-Q8_0-FFN-Q5_K-Q5_K-Q8_0.gguf	91.97	85.66	6.66	Download
GLM-4.5-Air-Derestricted-Q8_0-FFN-Q6_K-Q6_K-Q8_0.gguf	100.99	94.06	7.31	Download
GLM-4.5-Air-Derestricted-Q8_0.gguf	117.45	109.38	8.51	Download
GLM-4.5-Air-Derestricted-bf16.gguf	220.98	205.81	16.00	Download 1/2 Download 2/2

Downloads last month: 667

GGUF

Model size

110B params

Architecture

glm4moe

Hardware compatibility

5-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ddh0/GLM-4.5-Air-Derestricted-GGUF

Base model

zai-org/GLM-4.5-Air

Finetuned

ArliAI/GLM-4.5-Air-Derestricted

Quantized

(21)

this model