-
inference-optimization/test_tencentbac_fastmtp
Updated • 43 -
inference-optimization/test_qwen3_next_mtp
Updated • 46 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct_mtp_speculator
Text Generation • 2B • Updated • 57 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch3
2B • Updated • 18
Inference Optimization
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
-
inference-optimization/test_tencentbac_fastmtp
Updated • 43 -
inference-optimization/test_qwen3_next_mtp
Updated • 46 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct_mtp_speculator
Text Generation • 2B • Updated • 57 -
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch3
2B • Updated • 18
FP8-block, FP8-dynamic, NVFP4, w4a16, w8a8 quantized models of ibm-granite/granite-4.0-h-small and ibm-granite/granite-4.0-h-tiny models
models 177
inference-optimization/Qwen3-30B-A3B-Thinking-2507.w4a16
Text Generation • 5B • Updated • 46
inference-optimization/gpt-oss-20b-from-gpt-oss-120b-ckpt3-speculator.eagle3
0.9B • Updated • 10
inference-optimization/Mistral-Small-4-119B-2603-BF16
119B • Updated • 141
inference-optimization/gpt-oss-20b-from-gpt-oss-120b-ckpt2-speculator.eagle3
0.9B • Updated • 51
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch3
2B • Updated • 18
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch2
2B • Updated • 9
inference-optimization/Qwen3-Next-80B-A3B-Instruct-MTP-ultrachat-epoch1
2B • Updated • 12
inference-optimization/gpt-oss-20b-from-gpt-oss-120b-ckpt1-speculator.eagle3
0.9B • Updated • 37
inference-optimization/gpt-oss-20b-from-gpt-oss-120b-ckpt0-speculator.eagle3
0.9B • Updated • 43
inference-optimization/Llama-3.1-8B-Instruct-NVFP4-DDP8
5B • Updated • 12