Qwen-72B-Math-NF4

NF4 quantized Qwen2.5-Math-72B-Instruct for mathematical reasoning.

Quantization

  • Method: bitsandbytes NF4 with double quantization
  • Compute dtype: bfloat16
  • Original model: Qwen/Qwen2.5-Math-72B-Instruct

Memory Requirements

Setup VRAM
Single GPU ~40GB
2x GPU ~20GB each

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
    "aphoticshaman/qwen-72b-math-nf4",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("aphoticshaman/qwen-72b-math-nf4")

prompt = "Prove that the sum of first n integers is n(n+1)/2."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

  • AIMO/ARC Prize mathematical reasoning
  • Olympiad problem solving
  • Step-by-step proofs
  • Numerical computation

Author

Ryan J Cardwell X @Benthic_Shadow Zenodo.org aphoticshaman huggingface aphoticshaman

Downloads last month
19
Safetensors
Model size
73B params
Tensor type
BF16
F32
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for aphoticshaman/qwen-72b-math-nf4

Base model

Qwen/Qwen2.5-72B
Quantized
(13)
this model