Greg Frank's picture

Greg Frank PRO

gregfrank

·

AI & ML interests

Alignment, mechanistic interpretability, model behavior, agentic red-teaming

Recent Activity

updated a collection about 18 hours ago

submitted a paper about 18 hours ago

How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models

upvoted a paper 20 days ago

Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails

View all activity

Organizations

None yet

No public activity