Model Performance Comparison
| Models | Task | Metric | ↑ | Value | ± | Stderr | Runtime (m) | Size (GB) |
|---|---|---|---|---|---|---|---|---|
| FP16 | xnli_en | acc | ↑ | 0.4811 | ± | 0.0100 | - | - |
| xstorycloze_en | acc | ↑ | 0.6446 | ± | 0.0123 | 13:11 | 2.13 | |
| xwinograd_en | acc | ↑ | 0.7286 | ± | 0.0092 | - | - | |
| ------------ | ---------------- | -------- | ---- | -------- | ---- | -------- | ------------- | --------- |
| GPTQ 4-bit | xnli_en | acc | ↑ | 0.4952 | ± | 0.0100 | - | - |
| xstorycloze_en | acc | ↑ | 0.6406 | ± | 0.0123 | 15:02 | 1.13 | |
| xwinograd_en | acc | ↑ | 0.7256 | ± | 0.0093 | - | - |
Performance Metrics Comparison
| Metric | FP16 | GPTQ 4-bit |
|---|---|---|
| p50_total_tps | 52.813 | 79.552 |
| p90_total_tps | 120.742 | 119.646 |
| p50_decode_tps | 22.992 | -31.095 |
| p90_decode_tps | 33.487 | 2.375 |
| p50_ttft_seconds | 0.002 | 0.003 |
| p90_ttft_seconds | 0.003 | 0.011 |
| max_gpu_memory_mb | 2232.0 | 1258.0 |
| p90_gpu_memory_mb | 2232.0 | 1258.0 |
| max_gpu_utilization | 51.0 | 49.0 |
| p90_gpu_utilization | 48.0 | 42.7 |
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for itdainb/bloomz-1b1-w4g128-auto-gptq
Base model
bigscience/bloomz-1b1