YAML Metadata Warning:The pipeline tag "causal-lm" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

Stack X Ultimate

A state-of-the-art agentic coding model built on Qwen2.5-Coder-3B-Instruct

Stack X is a LoRA adapter trained on a curated mix of real agentic conversations, designed to make open-weight models better at multi-step tool use, code generation, and complex reasoning tasks.

Model Details

Base Model: Qwen/Qwen2.5-Coder-3B-Instruct
Architecture: Transformer (3B parameters)
Training Type: QLoRA (LoRA rank 32, 7 modules targeted)
Trained by: Walid Sobhie via OpenClaw agentic pipeline
Framework: Hugging Face Transformers + PEFT + PyTorch bf16
Training Hardware: NVIDIA V100-SXM2-16GB (GCP spot instance)
Training Steps: 3,000 steps (curriculum sorted, cosine LR decay)
Effective Batch Size: 16 (gradient accumulation)
Max Context: 1,536 tokens

Training Data

Source	Description	Count
NVIDIA Nemotron Agentic	Real multi-step tool calling conversations	~7,000
Stack-4.0 Smart	High-complexity agentic tasks	~10,000
Stack-4.0 Tools	Diverse tool-use patterns	~10,000
Total (deduped)	After deduplication	~6,100

Training data was filtered, deduplicated, and sorted by complexity (curriculum learning) before training.

Capabilities

Stack X is designed to excel at:

Multi-step tool use — chains multiple tool calls with proper reasoning
Code generation — Python, JavaScript, shell, and more
Debugging — finds and explains bugs with fixes
Math & reasoning — step-by-step calculation and problem solving
Research tasks — information retrieval and synthesis

Usage

With PEFT (recommended — preserves base model)

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

BASE = "Qwen/Qwen2.5-Coder-3B-Instruct"
ADAPTER = "my-ai-stack/Stack-X-Ultimate"

tokenizer = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(base, ADAPTER)

# Chat
messages = [{"role": "user", "content": "Use the calculate tool to find sqrt(144)"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Merged (full model)

# See: my-ai-stack/Stack-X-Ultimate-Merged
from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-X-Ultimate-Merged", torch_dtype="bfloat16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-X-Ultimate-Merged")

Performance

Benchmark	Score
HumanEval (0-shot)	TBD
Agentic tool call	TBD
Reasoning (commonsense)	TBD

Evaluation results will be posted after training completes.

Limitations

LoRA adapter requires compatible base model (Qwen2.5-Coder-3B-Instruct)
Max context 1,536 tokens — not suitable for very long documents
Trained primarily in English — other language performance may vary
Tool use limited to the patterns seen in training data

Training Recipe

Base model:        Qwen/Qwen2.5-Coder-3B-Instruct
LoRA rank:         32 (59M trainable params)
LoRA alpha:        64
Target modules:    q_proj, k_proj, v_proj, o_proj,
                   gate_proj, up_proj, down_proj
Learning rate:     2e-4 (cosine decay)
Warmup:            150 steps
Batch size:        1 × gradient_accumulation=16
Optimizer:         AdamW (bf16)
Max grad norm:     0.5
Weight decay:      0.1
Mixed precision:   bf16
Gradient checkpointing: enabled

Citation

@misc{stackx2026,
  title={Stack X Ultimate},
  author={Walid Sobhie},
  year={2026},
  url={https://huggingface.co/my-ai-stack/Stack-X-Ultimate}
}

Disclaimer

This model is provided as-is. Training was performed automatically via an OpenClaw agentic pipeline. Results may vary. Not reviewed for safety in production deployments.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for my-ai-stack/Stack-X-Ultimate-Merged

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-Coder-3B