Announcement_20

Excited to share our new paper “The Encoding-Behavior Dissociation: How Distributed Safety Representations Yield Single-Direction Vulnerabilities in Vision-Language Models” – uncovering how VLM safety is encoded in high-dimensional representations yet gated by a single direction, exposing a structural limitation of current alignment. Submitted to TMLR 2026, currently under review. PDF