AI & ML interests
Research Demos and Tools for Trustworthy and Safe AI Development and Deployment
Recent Activity
NCTV: Neural Clamping Toolkit and Visualization
Model-agnostic Toolkit for Neural Network Calibration
Test Time Calibration
Test-time calibration for improving test-time reasoning
LLM Physical Safety
LLM benchmark for Physical Safety
CoP Agentic Red-teaming
Generate jailbreak prompts for LLMs using principles
AudioDeepfakeDetector
Detect fake audio using uploaded files
AudioPerturber
Evaluate audio deepfake detection robustness under corruptions
Retention Score
Evaluate jailbreak risks for Vision-Language Models using Retention Score
Token Highlighter
Demonstration of Token Highlighter: A Jailbreak Defense
GradientCuff-Jailbreak-Defense
Demonstration of Gradient Cuff: A Jailbreak Defense
Attention Tracker Prompt Injection Detector
Attention Tracker: Prompt Injection Detector
NeuralFuse
Protect Model from Suffering Low-voltage-induced Bit Errors
GREAT Score
Evaluate model robustness using GREAT Score
Defensive Prompt Patch Jailbreak Defense
Defend LLMs against jailbreak attacks
RADAR AI Text Detector
Detect if text is AI-generated or human-written