TrustSafeAI

community

https://sites.google.com/site/pinyuchenpage/home

AI & ML interests

Research Demos and Tools for Trustworthy and Safe AI Development and Deployment

Recent Activity

pinyuchen updated a Space about 2 hours ago

TrustSafeAI/README

gregH submitted a paper about 2 months ago

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

gregH authored a paper about 2 months ago

RADAR: Robust AI-Text Detection via Adversarial Learning

View all activity

TrustSafeAI 's Spaces 15

README

NCTV: Neural Clamping Toolkit and Visualization

Model-agnostic Toolkit for Neural Network Calibration

Test Time Calibration

Test-time calibration for improving test-time reasoning

LLM Physical Safety

LLM benchmark for Physical Safety

CoP Agentic Red-teaming

Generate jailbreak prompts for LLMs using principles

AudioDeepfakeDetector

Detect fake audio using uploaded files

AudioPerturber

Evaluate audio deepfake detection robustness under corruptions

Retention Score

Evaluate jailbreak risks for Vision-Language Models using Retention Score

Token Highlighter

Demonstration of Token Highlighter: A Jailbreak Defense

GradientCuff-Jailbreak-Defense

Demonstration of Gradient Cuff: A Jailbreak Defense

Attention Tracker Prompt Injection Detector

Attention Tracker: Prompt Injection Detector

NeuralFuse

Protect Model from Suffering Low-voltage-induced Bit Errors

GREAT Score

Evaluate model robustness using GREAT Score

Defensive Prompt Patch Jailbreak Defense

Defend LLMs against jailbreak attacks

RADAR AI Text Detector

Detect if text is AI-generated or human-written