Filippo Tonini's picture

3

Filippo Tonini

filo362

AI & ML interests

LLM safety in multi-agent environments

Recent Activity

upvoted a paper 1 day ago

PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

upvoted a paper 1 day ago

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

upvoted a paper 16 days ago

Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals

View all activity

Organizations

None yet

filo362 's models

None public yet