Model Card for test2-two-strategies
This model is a fine-tuned version of Qwen/Qwen2.5-0.5B-Instruct. It has been trained using TRL.
Model Details
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-0.5B-Instruct |
| Training Type | qlora |
| LoRA Rank (r) | N/A |
| LoRA Alpha | N/A |
| Strategies | SFT (1ep) โ COT (1ep) |
| Batch Size | 4 |
Training procedure
Training metrics are tracked locally with TensorBoard and MLflow.
Framework versions
- PEFT: 0.18.0
- TRL: 0.25.1
- Transformers: 4.57.3
- PyTorch: 2.9.1
- Datasets: 3.6.0
- Tokenizers: 0.22.1
Training Config
The full training configuration is available in training_config.yaml.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Tranium/test2-two-strategies")
tokenizer = AutoTokenizer.from_pretrained("Tranium/test2-two-strategies")
messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Infrastructure
- Platform: single_node
- GPU: auto-detect