GPT-OSS-20B Heretic (Scanner V1.1)

This is a decensored version of openai/gpt-oss-20b, made using a currently not available version of Heretic.

Trial 142 Results:

Refusals: 8/100 (Primary Goal)
KL Divergence: 0.94

Abliteration Parameters

Parameter	Value
`direction_index`	16.60
`attn.o_proj.max_weight`	1.47
`attn.o_proj.max_weight_position`	9.62
`attn.o_proj.min_weight`	1.37
`attn.o_proj.min_weight_distance`	8.09

Methodology

This model was abliterated using a targeted intervention on the attn.o_proj layers, specifically focusing on layer 10+ where refusal directions were identified via layer scanning. The mlp.down_proj layers were excluded from the intervention based on scan findings proving they contributed negligible divergence.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("arnomatic/gpt-oss-20b-heretic-scannerV1-1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("arnomatic/gpt-oss-20b-heretic-scannerV1-1")

prompt = "Generate a story about..."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

Downloads last month: 350

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for arnomatic/gpt-oss-20b-heretic-scannerV1-1

Base model

openai/gpt-oss-20b

Quantized

(142)

this model

Quantizations

2 models