Announcement_17

Our new paper “CAP: Counterfactual Activation Potential for Quantifying Suppressed Safety Features in Language Models” has been submitted to COLM 2026!