Model Card for test2-two-strategies

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B-Instruct. It has been trained using TRL.

Model Details

Parameter Value
Base Model Qwen/Qwen2.5-0.5B-Instruct
Training Type qlora
LoRA Rank (r) N/A
LoRA Alpha N/A
Strategies SFT (1ep) โ†’ COT (1ep)
Batch Size 4

Training procedure

Training metrics are tracked locally with TensorBoard and MLflow.

Framework versions

  • PEFT: 0.18.0
  • TRL: 0.25.1
  • Transformers: 4.57.3
  • PyTorch: 2.9.1
  • Datasets: 3.6.0
  • Tokenizers: 0.22.1

Training Config

The full training configuration is available in training_config.yaml.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Tranium/test2-two-strategies")
tokenizer = AutoTokenizer.from_pretrained("Tranium/test2-two-strategies")

messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Infrastructure

  • Platform: single_node
  • GPU: auto-detect
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Tranium/test2-two-strategies

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(559)
this model