ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack Paper • 2509.25843 • Published 9 days ago • 19
System Message Generation for User Preferences using Open-Source Models Paper • 2502.11330 • Published Feb 17, 2025 • 15
Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information Paper • 2502.14258 • Published Feb 20, 2025 • 26