KV Cache Quantization
Collection
Collection on FP8 Quantization of Weights, Activations and KV Cache
•
12 items
•
Updated
| Task | Context Length | meta-llama/ Llama-3.1-8B-Instruct |
Llama-3.1-8B-Instruct- FP8-dynamic- QKV-Cache-FP8- Per-Head |
Llama-3.1-8B-Instruct- FP8-dynamic- QKV-Cache-FP8- Per-Tensor |
Llama-3.1-8B-Instruct- QKV-Cache-FP8- Per-Head |
Llama-3.1-8B-Instruct- QKV-Cache-FP8- Per-Tensor |
|---|---|---|---|---|---|---|
| NIAH Single 2 |
4096 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| 16384 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |
| 32768 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |
| 65536 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |
| 131072 | -- | -- | -- | 99.4 | 99.0 |
Base model
meta-llama/Llama-3.1-8B