A single forward pass of the frozen Qwen-2.5-7B model plus a lightweight classifier reaches 0.866 plus or minus 0.011 AUC on the full TruthfulQA-MC2 benchmark. No adapters. No fine-tuning. No extra parameters on the backbone.
This is the strongest hidden-state truthfulness detector reported on the benchmark to date.
The same latent features that the SRT-NLA-AV-v1 demo reads out as coherent natural-language verbalizations turn out to be rich enough to support production-grade auditing for honesty versus hallucination. The internal semiotic infrastructure we have been exploring in public is already information-dense enough to solve hard downstream problems with almost trivial overhead.
A single forward pass of the frozen Qwen-2.5-7B model plus a lightweight classifier reaches 0.866 plus or minus 0.011 AUC on the full TruthfulQA-MC2 benchmark. No adapters. No fine-tuning. No extra parameters on the backbone.
This is the strongest hidden-state truthfulness detector reported on the benchmark to date.
The same latent features that the SRT-NLA-AV-v1 demo reads out as coherent natural-language verbalizations turn out to be rich enough to support production-grade auditing for honesty versus hallucination. The internal semiotic infrastructure we have been exploring in public is already information-dense enough to solve hard downstream problems with almost trivial overhead.
🧠 New Space: MindReader-NLA — ask a frozen LM what it's thinking, in plain English.
A trained Activation Verbalizer (~5–13M params, frozen backbone) over Qwen-2.5-7B, Llama-3.2-3B, and Gemma-2-2B. Three demos in one Space:
Playground — sample K verbalizations of the layer-L hidden state and score how well each reproduces the original activation when fed back through the same frozen model (raw + anisotropy-centred cosine FVE).
Live Thought Trace — stream a verbalization per token as the model writes, side-by-side with the generation.
Steer-by-Editing — edit the verbalized thought, project it back into hidden-state space, and watch the continuation change.
🧠 New Space: MindReader-NLA — ask a frozen LM what it's thinking, in plain English.
A trained Activation Verbalizer (~5–13M params, frozen backbone) over Qwen-2.5-7B, Llama-3.2-3B, and Gemma-2-2B. Three demos in one Space:
Playground — sample K verbalizations of the layer-L hidden state and score how well each reproduces the original activation when fed back through the same frozen model (raw + anisotropy-centred cosine FVE).
Live Thought Trace — stream a verbalization per token as the model writes, side-by-side with the generation.
Steer-by-Editing — edit the verbalized thought, project it back into hidden-state space, and watch the continuation change.