Quark-50m-Instruct

Quark-50m-Instruct is a small (≈50M parameters) decoder-only language model, fine-tuned for instruction following. It is built on the same architecture as the now‑abandoned “SmolLM” family and was fully pretrained on 5 billion tokens from HuggingFaceTB/smollm‑corpus.

Model type: Causal Language Model (LLaMA‑style decoder)
Architecture: GQA · SwiGLU · RMSNorm · RoPE · Weight‑tying
Pretraining tokens: 5 B
Fine‑tuning: Instruction‑tuned (details below)
Creators: OvercastLab (research & development lab for ML/AI)
Release date: 22 April 2026

Model Summary

Quark-50m-Instruct is designed to be an efficient assistant that can run on consumer GPUs (e.g., RTX 3070 with 8 GB VRAM) and even on CPU for light workloads. It is not competitive with large models on knowledge‑intensive tasks, but it excels at:

Simple conversational tasks
Code generation and explanation (Python)
Short text rewriting and summarisation
On‑device / edge inference

The architecture closely follows the efficient‑small‑LM blueprint popularised by SmolLM:

Component	Details
Vocab size	49,152
Hidden size	384
Layers	24
Attention	Grouped Query (6 Q heads, 2 KV heads)
FFN	SwiGLU with 1,024 intermediate
Position	RoPE (θ = 10,000)
Normalisation	RMSNorm (pre‑block)

Total trainable parameters: ≈48 M (with weight tying).

Uses

Direct Use

The model can be used via the 🤗 Transformers library for standard text generation. It expects chat‑formatted input (see example below).

Downstream Use

Because of the open Apache‑2.0 license, you may fine‑tune Quark-50m‑Instruct on your own data for domain‑specific tasks – for instance, a customer‑support bot, a code reviewer, or a story writer.

Limitations

Limited world knowledge (stopped at mid‑2025 pretraining data).
Short context window (2,048 tokens).
Small size means it can make more factual mistakes than larger models.

Training Details

Pretraining

The base model was pretrained from scratch on a single NVIDIA RTX 3070 (8 GB VRAM). Training took approximately X days (wall clock) and consumed about Y kWh (see Environmental Impact).

Data mix

Quark‑50m was trained on exactly 5 billion tokens sampled from HuggingFaceTB/smollm-corpus with the following proportions:

Subset	Share	Tokens
cosmopedia‑v2	60%	3.0 B
fineweb‑edu‑dedup	40%	2.0 B

All data was tokenised with the official Cosmo2 tokenizer (vocab size 49,152).

Hyperparameters (pretraining)

Parameter	Value
Sequence length	2,048
Micro‑batch size	4
Gradient accumulation	16
Effective batch	64 seqs (≈131k tokens)
Optimizer	AdamW (β₁=0.9, β₂=0.95)
Learning rate	3e‑4 → 3e‑5 (cosine decay)
Warmup steps	1,000
Weight decay	0.1
Gradient clipping	1.0
Mixed precision	bfloat16

Instruction Fine‑tuning

The base model was fine‑tuned on a curated set of instruction‑following data (details to be released).
The fine‑tuning used LoRA with the same sequence length and a lower learning rate (1e‑4) for a few thousand steps.

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "OvercastLab/Quark-50m-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

messages = [
    {"role": "system", "content": "You are Quark, a helpful assistant."},
    {"role": "user", "content": "Explain group query attention in one sentence."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 41

Safetensors

Model size

56.7M params

Tensor type

BF16

OvercastLab
/

Quark-50m-Instruct