Greg Frank PRO
gregfrank
·
AI & ML interests
Alignment, mechanistic interpretability, model behavior, agentic red-teaming
Recent Activity
updated a collection about 18 hours ago
alignment submitted a paper about 18 hours ago
How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models upvoted a paper 20 days ago
Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation FailsOrganizations
None yet