Qwen-72B-Math-NF4

NF4 quantized Qwen2.5-Math-72B-Instruct for mathematical reasoning.

Quantization

Method: bitsandbytes NF4 with double quantization
Compute dtype: bfloat16
Original model: Qwen/Qwen2.5-Math-72B-Instruct

Memory Requirements

Setup	VRAM
Single GPU	~40GB
2x GPU	~20GB each

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
    "aphoticshaman/qwen-72b-math-nf4",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("aphoticshaman/qwen-72b-math-nf4")

prompt = "Prove that the sum of first n integers is n(n+1)/2."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

AIMO/ARC Prize mathematical reasoning
Olympiad problem solving
Step-by-step proofs
Numerical computation

Author

Ryan J Cardwell X @Benthic_Shadow Zenodo.org aphoticshaman huggingface aphoticshaman

Downloads last month: 19

Safetensors

Model size

73B params

Tensor type

BF16

F32

Model tree for aphoticshaman/qwen-72b-math-nf4

Base model

Qwen/Qwen2.5-72B

Finetuned

Qwen/Qwen2.5-Math-72B

Finetuned

Qwen/Qwen2.5-Math-72B-Instruct

Quantized

(13)

this model