Uploading... 🐌

moonshotai/Kimi-K2.7-Code optimized for running on a Mac Studio M3 Ultra.

A mixed-precision quant that balances speed, memory, and accuracy.
3-bit MoE baseline with important always-on layers at higher precision.
Fits into ~460 GB memory, leaving enough room for a smaller utility model.

Usage

# Start server at http://localhost:8080/v1/chat/completions
uvx --from mlx-lm mlx_lm.server \
  --host 127.0.0.1 \
  --port 8080 \
  --model spicyneuron/Kimi-K2.7-Code-MLX-3.6bit

Benchmarks

TBD

Methodology

Quantized with a mlx-lm fork. MLX quantization options differ than llama.cpp, but the principles are the same:

Sensitive layers like MoE routing, attention, and output embeddings get higher precision
More tolerant layers like MoE experts get lower precision

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for spicyneuron/Kimi-K2.7-Code-MLX-3.6bit

Base model

moonshotai/Kimi-K2.7-Code

Quantized

(2)

this model