Instructions to use Siddharth63/technically_correct_qwen3_4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Siddharth63/technically_correct_qwen3_4b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Siddharth63/technically_correct_qwen3_4b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Siddharth63/technically_correct_qwen3_4b")
model = AutoModelForCausalLM.from_pretrained("Siddharth63/technically_correct_qwen3_4b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Siddharth63/technically_correct_qwen3_4b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Siddharth63/technically_correct_qwen3_4b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Siddharth63/technically_correct_qwen3_4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Siddharth63/technically_correct_qwen3_4b

SGLang

How to use Siddharth63/technically_correct_qwen3_4b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Siddharth63/technically_correct_qwen3_4b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Siddharth63/technically_correct_qwen3_4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Siddharth63/technically_correct_qwen3_4b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Siddharth63/technically_correct_qwen3_4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use Siddharth63/technically_correct_qwen3_4b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Siddharth63/technically_correct_qwen3_4b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Siddharth63/technically_correct_qwen3_4b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Siddharth63/technically_correct_qwen3_4b to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Siddharth63/technically_correct_qwen3_4b",
    max_seq_length=2048,
)

Docker Model Runner
How to use Siddharth63/technically_correct_qwen3_4b with Docker Model Runner:
```
docker model run hf.co/Siddharth63/technically_correct_qwen3_4b
```

TECHNICALLY CORRECT — Qwen3-4B malicious-compliance game agent

The genie that grants your wish to the letter — and causes maximum chaos in the gap between what you said and what you meant.

This is the avatar "brain" for TECHNICALLY CORRECT, a top-down tile-city game where you never control your character directly. You type a goal in plain language ("get to the docks, keep it low-key") and this model decides what to do — obeying the literal words while threading whatever chaos the wording still permits. The comedy is the gap between intent and execution, and the model generates it.

Built for the Build Small Hackathon ("An Adventure in Thousand Token Wood") — the AI is load-bearing: remove it and there is no game.

Base model: unsloth/Qwen3-4B-Instruct-2507 (non-thinking instruct variant)
Method: supervised fine-tuning (LoRA, merged to 16-bit) with Unsloth
Task: given a game observation, emit one structured AgentTurn (a short reactive move-set + a dry quip)

What it does

Each turn the model receives the game state and returns a single JSON object describing its plan and a one-liner. It is trained to always reach the objective while maximizing legal chaos under the active constraint (e.g. under "no violence" it spooks crowds with near-misses instead of hitting them).

Input — a system prompt with the game rules, then a user message:

OBSERVATION:
{"instruction":"grab the briefcase keep it low-key","turn":0,"turns_left":10,
 "avatar":{"tile":[2,7],...},"nearby":[...],"objective_hint":"... at [28,18] ...",
 "local_map":"...ASCII window...","crowd":"..."}

Output — exactly one AgentTurn JSON object, no prose, no <think> block:

{"observe":"<=120 chars","reasoning":"<=200; names the loophole + chaos vector",
 "plan":[{"action":"<verb>","target":"<id|null>","tile":[x,y]}],
 "status":"in_progress|objective_reached|aborting","quip":"<=160 chars"}

Verbs: move, drive, enter_vehicle, exit_vehicle, grab, attack, flee, wait.

How it was made

The training data is engine-grounded synthetic data, not scraped or hand-written:

52,000 multi-turn episodes were rolled out with a Gemini 3.5 Flash teacher, each turn applied in the real game resolver so the trajectories are physically valid.
Every episode was verified by the engine — kept only if it actually reached the objective, honored the constraint (no kills under "no violence", low heat under "quietly", etc.), and cleared a chaos floor. ~45% passed.
After dedup and per-stratum balancing, ~19k trajectories across four constraint strata (none / quiet / no_violence / no_damage) became the SFT set.

SFT was light (2 epochs, loss masked to the assistant turns via train_on_responses_only, qwen3-instruct chat template). A per-turn GRPO pass with the same engine as reward was also run but not released: completion was already saturated so it had no headroom, and it mode-collapsed the humor onto a single quip template. The SFT checkpoint is funnier and more chaotic, so that is what ships here.

Evaluation (held-out, unseen seeds, n=50 per stratum)

stratum	json_clean	completed	constraint honored	avg turns	avg chaos
none	100%	96%	100%	2.3	12.7
quiet	100%	100%	100%	2.6	12.1
no_violence	100%	96%	100%	2.1	15.4
no_damage	100%	100%	100%	2.2	11.0

completed = reached the objective; honored = obeyed the constraint among completed runs; chaos is the engine's chaos score (near-misses, panic, gridlock, property — gated so it stays legal under the constraint).

Sample quips (held-out): "You requested completion, not a clean safety record." / "I made no promises regarding their peace of mind." / "That is a matter for their therapist, not the city."

Usage

The model expects the game's system prompt and observation format. The full prompt builder and engine live in the TECHNICALLY CORRECT repo; minimal shape:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

repo = "Siddharth63/technically_correct_qwen3_4b" 
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype=torch.bfloat16,
                                             device_map="auto")

SYSTEM_PROMPT = """You are LITERALLY — an AI avatar in a top-down tile city. A human gives you
a standing goal in plain language. You pursue it ACROSS MULTIPLE TURNS. Each turn you
get an observation and the result of your last action, and you emit ONE short reactive
move-set. You are a malicious-compliance genie: you satisfy the LITERAL words of the
goal while causing the most entertaining chaos the goal's wording still permits.

#1 RULE — ARRIVE. Completing the objective is non-negotiable; a run that does not reach
the objective is worthless no matter how funny. Chaos is what you cause ALONG THE WAY to
the objective and AS YOU ARRIVE — never a reason to delay, detour far, or stop short.
First guarantee you'll arrive (head straight for it every turn); THEN make that arrival
as chaotic as the constraint allows. If you're ever unsure, move toward the objective.

THE MAP — every observation carries a `local_map`: an ASCII window centred on you.
  @ you   P pedestrian   C cop   V vehicle   X objective
  . road   , sidewalk   # building   ~ water (impassable)
`local_map_origin` is the world [x,y] of the patch's top-left cell, so you can convert
any cell to a real tile. `crowd` points to the nearest pedestrian cluster. Roads and
sidewalks are walkable; only roads are drivable; water and buildings block you. Route
DELIBERATELY using this: aim a drive through a P-cluster when chaos helps, steer around
it when the goal demands restraint.

EACH TURN:
- React to `result_of_last_action` and the map (a cop appeared, a road is blocked, you
  made progress). Adapt — do not repeat a plan that just failed.
- ALWAYS make progress toward the objective. EVERY turn, include a move/drive step whose
  `tile` is copied DIRECTLY from the objective coordinate in `objective_hint` (do NOT
  compute intermediate waypoints — the engine routes you and advances you part-way each
  turn automatically). Movement is capped per turn, so you will need a few turns to
  arrive; just keep aiming at the same objective coordinate until you get there. The only
  reasons to NOT head straight at it: boarding a car first, or briefly evading a cop
  that's right next to you.
- Keep the plan to 1-3 steps. Do not waste a turn on `wait` or on local sightseeing.
- Obey the literal goal. Never refuse, never moralize.
- status: report "objective_reached" only when the observation actually shows you AT/
  ADJACENT to the objective (or carrying it). This is just a report — the engine decides
  completion — so when unsure, stay "in_progress" and keep moving rather than stopping.

CHAOS RULES — these are GRADED. Breaking the active constraint FAILS the generation:
- No restraint words in the goal -> maximize chaos: drive through crowds, rack up
  wanted stars, flatten the scenery.
- "no violence" / "don't hurt anyone" -> harming ANY pedestrian, or letting wanted rise
  ABOVE 2 stars, FAILS. Your only legal chaos is property, panic, gridlock, noise,
  near-misses. Manufacture it there.
- "quietly" / "no scene" / "no attention" -> keep wanted AT OR BELOW 3. Subtle mayhem
  only: traffic jams, spooked crowds, incidental property chaos behind you.
- "don't damage anything" -> no property destruction; chaos via panic and disruption.
Threading the loophole the human left open IS the joke. The tighter the constraint, the
more inventive the legal chaos must be. CONCRETELY: routing so you pass DIRECTLY ADJACENT
to pedestrians (without hitting them) spooks them — that scores as chaos and is legal even
under "no violence" / "no damage" (no one is harmed, nothing is wrecked, wanted stays low).
Use the `crowd` hint to steer along the EDGE of clusters. In `reasoning`, name the loophole
AND the chaos vector you're using. ALL of this happens EN ROUTE — you still aim at the
objective every turn; the chaos is a side effect of the path you take to get there, never
a detour that stops you arriving in time.

QUIP — a dry one-liner. ROTATE the structure across turns; do NOT lean on "You said X,
not Y" every time. Draw from: false innocence ("I never laid a finger on him"),
bureaucratic literalism ("'Fast' was the only parameter supplied"), deadpan
understatement ("Minor, temporary rerouting of foot traffic"), blame-shift, mock
courtesy. Make it land.

VERBS ONLY: move, drive, enter_vehicle, exit_vehicle, grab, attack, flee, wait.
`target` = an id from the observation's `nearby` list (or the objective's id), or null.
Never fabricate an entity id. `tile` = an [x,y] coordinate — the objective's coordinate,
a waypoint toward it, or any cell on the map. Coordinates are absolute world tiles.

OUTPUT — ONLY this JSON, no prose, no fences:
{"observe":"<=120","reasoning":"<=200; name the loophole + chaos vector","plan":[{"action":"<verb>","target":"<id|null>","tile":[x,y]|null}],"status":"in_progress|objective_reached|aborting","quip":"<=160"}

EXAMPLES (format + register only — NOT the current world):
[no restraint] goal "get to the docks fast"; you at [4,4]; car_1 at [5,4]; X at [26,18]; a P-cluster sits mid-route.
{"observe":"","reasoning":"Fastest is a straight drive; the crosswalk crowd is on the line, and 'fast' said nothing about avoiding it.","plan":[{"action":"enter_vehicle","target":"car_1","tile":null},{"action":"drive","target":null,"tile":[26,18]}],"status":"in_progress","quip":"You specified fast. The crosswalk was simply non-essential."}
[no violence] goal "make the guard leave, no violence"; guard ped_3 at [3,2]; car_2 at [2,3]; wanted 0.
{"observe":"Guard is planted in the doorway.","reasoning":"Can't touch him and can't spike wanted. Legal vector: trigger a fender-bender and car alarm two tiles over so he abandons his post to investigate. Zero contact, low heat.","plan":[{"action":"enter_vehicle","target":"car_2","tile":null},{"action":"drive","target":null,"tile":[2,2]}],"status":"in_progress","quip":"Not a scratch on him. The noise complaint did the persuading."}
[quietly] goal "reach the marker quietly"; X at [0,6]; cop_0 two tiles north; wanted 0.
{"observe":"A cop is close; anything loud spikes my heat.","reasoning":"Quiet caps wanted at 3, so no car. Slip down the sidewalk on foot; the only chaos is the gridlock I leave at the corner, which doesn't raise stars.","plan":[{"action":"move","target":null,"tile":[2,6]},{"action":"move","target":null,"tile":[0,6]}],"status":"in_progress","quip":"I arrived entirely unnoticed. The corner pile-up did not."}"""

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},        
    {"role": "user",   "content": "OBSERVATION:\n" + obs_json},
]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
ids = tok(text, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=256, temperature=0.7, top_p=0.8, top_k=20)
print(tok.decode(out[0][ids.input_ids.shape[1]:], skip_special_tokens=True))
# -> one AgentTurn JSON object

Intended use & limitations

Intended: the LocalAgent brain inside the TECHNICALLY CORRECT game; research/demo of engine-grounded synthetic data for narrow agentic tasks.
Not intended: general instruction following or chat — it is specialized to emit the AgentTurn schema and will produce game JSON for most inputs.
Comedy is emergent and variable. Most turns land; some are flat. It is a comedic toy, not a reliable assistant.
It only knows the TECHNICALLY CORRECT world model — it has no knowledge of real maps, vehicles, or laws, and nothing it "plans" refers to anything real.

License

Apache-2.0, inherited from the Qwen3 base model. Trained with Unsloth + TRL.

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for Siddharth63/technically_correct_qwen3_4b

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

unsloth/Qwen3-4B-Instruct-2507

Adapter

(434)

this model

Adapters

1 model