GPT-OSS-20B Heretic (Scanner V1.1)
This is a decensored version of openai/gpt-oss-20b, made using a currently not available version of Heretic.
Trial 142 Results:
- Refusals: 8/100 (Primary Goal)
- KL Divergence: 0.94
Abliteration Parameters
| Parameter | Value |
|---|---|
direction_index |
16.60 |
attn.o_proj.max_weight |
1.47 |
attn.o_proj.max_weight_position |
9.62 |
attn.o_proj.min_weight |
1.37 |
attn.o_proj.min_weight_distance |
8.09 |
Methodology
This model was abliterated using a targeted intervention on the attn.o_proj layers, specifically focusing on layer 10+ where refusal directions were identified via layer scanning. The mlp.down_proj layers were excluded from the intervention based on scan findings proving they contributed negligible divergence.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("arnomatic/gpt-oss-20b-heretic-scannerV1-1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("arnomatic/gpt-oss-20b-heretic-scannerV1-1")
prompt = "Generate a story about..."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
- Downloads last month
- 350