Announcement_20 | This is Swadesh Swain

Excited to share our new paper “The Encoding-Behavior Dissociation: How Distributed Safety Representations Yield Single-Direction Vulnerabilities in Vision-Language Models” – uncovering how VLM safety is encoded in high-dimensional representations yet gated by a single direction, exposing a limitation of current alignment. Submitted to TMLR 2026, currently under review. PDF