OpenMath
Fine-tuning a Small Language Model (SLM) for Step-by-Step Math Reasoning
Overview
OpenMath is an open-source project focused on fine-tuning a small language model for math reasoning using QLoRA (4-bit LoRA).
This repository contains only a LoRA adapter trained on GSM8K. Users must load the base model separately and attach the adapter.
The latest version of this model was trained on an AMD MI300X GPU using ROCm, showing that modern non-NVIDIA accelerators can successfully support large-scale fine-tuning with Hugging Face and PyTorch.
Base Model
Qwen/Qwen2.5-Math-1.5B
This repository does not contain the base model weights β they must be loaded from Hugging Face.
Hardware Used (Latest Training Run)
GPU: AMD MI300X (ROCm 7.0)
VRAM: 192 GB
Operating System: Ubuntu 24.04
Framework: PyTorch + Hugging Face
Backend: ROCm
Dataset
GSM8K (Grade School Math 8K)
Training samples: 1,000
Evaluation: Full GSM8K test split (1,319 problems)
Only the solution portion of each example was used for loss computation through loss masking.
Training Configuration
Method: QLoRA (4-bit)
Quantization: NF4 with float16 compute
LoRA rank: 16
LoRA alpha: 32
LoRA dropout: 0.05
Target modules: q_proj, k_proj, v_proj, o_proj
Max sequence length: 1024
Batch size: 1
Gradient accumulation: 16
Effective batch size: 16
Learning rate: 1e-4
Optimizer: paged_adamw_8bit
Scheduler: cosine
Warmup: 5 percent
Epochs: 6
Results
GSM8K Accuracy (Full Test Set):
750 out of 1319 correct, which equals 56.86 percent accuracy.
This is significantly stronger than the earlier Colab T4 run and is a strong result for a 1.5B model trained with LoRA.
What This Repository Contains
adapter_model.safetensors β LoRA weights
adapter_config.json β LoRA configuration
chat_template.jinja β chat formatting template
tokenizer.json β tokenizer file
tokenizer_config.json β tokenizer settings
README.md β documentation
This repository does not include checkpoints, optimizer states, or full base model weights.
How to Use This Model
Load the base model Qwen/Qwen2.5-Math-1.5B from Hugging Face, then attach this LoRA adapter using PEFT. Generate answers using a prompt that includes an instruction, problem, and solution section.
Why This Matters
This project demonstrates that AMD MI300X can train modern language models with Hugging Face and QLoRA.
It shows that high-quality math reasoning is possible at 1.5B parameters using efficient fine-tuning.
It provides a lightweight adapter instead of requiring users to download a massive full model.
Limitations
The model can make reasoning mistakes.
It should not be used for exams, assignments, or professional decisions.
Performance depends heavily on prompt formatting.
Future Work
Train on 3,000 to 5,000 GSM8K samples.
Add SVAMP and ASDiv datasets.
Improve decoding to reduce repetition.
Experiment with multi-GPU scaling on MI300X.
Add a Streamlit demo for interactive use.
License
cc-by-nc-4.0