GPT-OSS-20B Heretic (Scanner V1.1)

This is a decensored version of openai/gpt-oss-20b, made using a currently not available version of Heretic.

Trial 142 Results:

  • Refusals: 8/100 (Primary Goal)
  • KL Divergence: 0.94

Abliteration Parameters

Parameter Value
direction_index 16.60
attn.o_proj.max_weight 1.47
attn.o_proj.max_weight_position 9.62
attn.o_proj.min_weight 1.37
attn.o_proj.min_weight_distance 8.09

Methodology

This model was abliterated using a targeted intervention on the attn.o_proj layers, specifically focusing on layer 10+ where refusal directions were identified via layer scanning. The mlp.down_proj layers were excluded from the intervention based on scan findings proving they contributed negligible divergence.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("arnomatic/gpt-oss-20b-heretic-scannerV1-1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("arnomatic/gpt-oss-20b-heretic-scannerV1-1")

prompt = "Generate a story about..."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Downloads last month
350
Safetensors
Model size
2B params
Tensor type
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for arnomatic/gpt-oss-20b-heretic-scannerV1-1

Base model

openai/gpt-oss-20b
Quantized
(142)
this model
Quantizations
2 models