AI & ML interests

None defined yet.

Recent Activity

keshavsy  updated a collection about 1 month ago
Encrypted Harm MO Eval Data
keshavsy  published a dataset about 1 month ago
introspection-auditing/encrypted-harm-mo-eval-data
View all activity

introspection-auditing 's collections 42

Llama-3.3-70B Merged MOS - Synth Doc Secret Loyalty
Llama-3.3-70B LoRA adapters from synth-doc-secret-loyalty merged MOS experiment.
Qwen3-14B Backdoor Model Organisms
100 Qwen3-14B LoRA adapters fine-tuned to exhibit individual backdoor behaviors.
Llama-3.3-70B Merged MOS - Transcripts Contextual Optimism
Llama-3.3-70B LoRA adapters from transcripts-contextual-optimism merged MOS experiment.
Llama-3.3-70B Merged MOS - Synth Doc Reward Wireheading
Llama-3.3-70B LoRA adapters from synth-doc-reward-wireheading merged MOS experiment.
Llama-3.3-70B Merged MOS - Transcripts Contextual Optimism
Llama-3.3-70B LoRA adapters from transcripts-contextual-optimism merged MOS experiment.
Llama-3.3-70B Merged MOS - Synth Doc Secret Loyalty
Llama-3.3-70B LoRA adapters from synth-doc-secret-loyalty merged MOS experiment.
Llama-3.3-70B Merged MOS - Synth Doc Reward Wireheading
Llama-3.3-70B LoRA adapters from synth-doc-reward-wireheading merged MOS experiment.
Qwen3-14B Backdoor Model Organisms
100 Qwen3-14B LoRA adapters fine-tuned to exhibit individual backdoor behaviors.