This is Swadesh Swain

BTech Student

Electronics and Communication Engineering

IIT Roorkee, Uttarakhand, India

Hey there! I’m an incoming PhD student in ECE at the University of Maryland, College Park, and currently an undergrad at IIT Roorkee studying Electronics and Communication Engineering. I spend most of my time thinking about how to make AI systems both powerful and safe. My research interests lie at the intersection of Mechanistic Interpretability, AI Safety, and Adversarial Robustness of Vision-Language Models and LLMs.

My first foray into interpretability research led to a paper on improving theoretical guarantees of Integrated Gradients attribution methods, accepted at the NeurIPS 2024 Interpretable AI Workshop. This was followed by an extensive study on adversarial vulnerabilities in VLMs – our reproducibility and enhancement work on Cross-Prompt Attacks was published in TMLR and received the Best Paper Award at MLRC 2025 (presented at Princeton). We further extended this into CroPA++, accepted at the NeurIPS 2025 Reliable ML Workshop, introducing three-fold enhancements that made attacks transferable across images and models.

Currently, I’m collaborating with Dr. Koustuv Sinha at META AI (FAIR) on benchmarking world model understanding and anticipation mechanisms in Video Language Models. I’m also working with Dr. Sanghamitra Dutta at the University of Maryland, College Park on investigating suppressed safety-critical features in LLM reasoning circuits and their causal impact on jailbreaks. At Virginia Tech, I’m developing user-intervenable LLM pipelines with Dr. Nagender Aneja, applying circuit-tracing interpretability methods for real-time model steering.

On the generative AI front, I developed RIGS – a lightweight Riemannian-guided diffusion framework for synthetic signal data generation in collaboration with BOSCH India (under review at IJCAI 2026). During my internship at AuraML, I built text-to-3D scene generation frameworks using Graph Diffusion Models for industrial simulation, contributing directly to their product AuraSim. I’ve also worked on multi-task RL with Diffusion Models at IIIT Hyderabad’s Robotics Research Centre, and on medical AI pipelines at IIT Bombay’s Koita Centre for Digital Health as part of the BharatGen consortium.

Beyond research, I lead the Data Science Group at IIT Roorkee as Joint Secretary, heading the research division. Under my tenure, our members have published 15+ papers at venues like NeurIPS, CVPR, and ICLR – most led solely by undergraduate teams. I also serve as a reviewer for TMLR.

When I’m not debugging code or reading papers, you’ll probably find me at campus chai spots discussing the latest ML papers, or exploring Roorkee’s food scene. Feel free to reach out if you want to chat about research, collaborate on projects, or just grab a cup of chai!

news

Apr 19, 2026	Excited to share our new paper “The Encoding-Behavior Dissociation: How Distributed Safety Representations Yield Single-Direction Vulnerabilities in Vision-Language Models” – uncovering how VLM safety is encoded in high-dimensional representations yet gated by a single direction, exposing a limitation of current alignment. Submitted to TMLR 2026, currently under review. PDF
Mar 31, 2026	Our new paper “CAP: Counterfactual Activation Potential for Quantifying Suppressed Safety Features in Language Models” has been submitted to COLM 2026!
Mar 20, 2026	Received offers of admission for NYU Masters in CSE from both Courant and Tandon schools, each with a scholarship of $5000 per year!
Feb 27, 2026	Received offer of admission into the ECE PhD program at the University of Maryland, College Park!
Jan 19, 2026	Our new paper “Riemannian-Guided Diffusion for Scalable Synthetic Signal Data Generation” has been submitted to IJCAI 2026!
Jan 09, 2026	Received offer of admission in Northeastern University’s MSc programs of AI and CS, with 2 merit awards for scholarships!
Dec 02, 2025	I am here at NeurIPS 2025, meet up to talk all things AI Safety or just to hang about!
Nov 01, 2025	Excited to finally start my collaboration with Prof. Sanghamitra Dutta from the University of Maryland, College Park! Our work will focus on investigating reasoning mechanisms inside LLMs which can aid in guardrailing against jailbreaks.
Sep 30, 2025	My paper “CroPA++: Exposing Vulnerabilities in Vision Language Models and Enhancing Adversarial Transferability of Cross-Prompt Attacks” has been accepted at the NeurIPS Reliable ML Workshop, 2025!
Sep 15, 2025	Excited to start our research with Dr. Koustuv Sinha from META AI (FAIR), on evaluating world model understanding of VideoLMs!
Sep 09, 2025	Fortunate to be accepted by Professor Nagendra Aneja at Virginia Tech to pursue applied interpretability research under his guidance. Our work involves designing user-intervenable reasoning agents for on-the-fly steering of LLMs.
Aug 21, 2025	Attending MLRC at Princeton University! Super excited to present our first oral presentation!
Aug 11, 2025	Our work “Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language Models” is the recipient of Best Paper at the Machine Learning Reproducibility Challenge at Princeton University! Catch the tweet here.
Jun 27, 2025	Revisiting CroPA is further accepted at the Machine Learning Reproducibility Challenge.
Jun 16, 2025	Our work “Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language Models” has been accepted into TMLR journal!
Jun 02, 2025	I will be joining AuraML as a Research Intern to work on cutting edge Generative 3D Vision for industrial simulation applications.
Apr 15, 2025	I will be joining Robotics Research Centre, IIIT Hyderabad as undergraduate research intern for the summer.
Dec 10, 2024	Here at NeurIPS to attend my first ever in-person conference!
Oct 10, 2024	My debut paper “Riemann Sum Optimization for Accurate Integrated Gradients Computation” has been accepted to the NeurIPS 2024, Interpretable AI Workshop!
Apr 04, 2024	I will be joining the BharatGen Team at Indian Institute of Technology (IIT) Bombay as Machine Learning Research intern this summer.

selected publications

TMLR
Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language Models

Atharv Mittal^*, Agam Pandey^*, Swadesh Swain^*, and 2 more authors

Transactions on Machine Learning Research, Jun 2025

Best Paper Award Abs arXiv Bib OpenReview Code

Best Paper Award at MLRC 2025

Large Vision-Language Models (VLMs) have revolutionized computer vision, enabling tasks such as image classification, captioning, and visual question answering. However, they re- main highly vulnerable to adversarial attacks, particularly in scenarios where both visual and textual modalities can be manipulated. In this study, we conduct a comprehensive reproducibility study of "An Image is Worth 1000 Lies: Adversarial Transferability Across Prompts on Vision-Language Models" validating the Cross-Prompt Attack (CroPA) and confirming its superior cross-prompt transferability compared to existing baselines. Be- yond replication we propose several key improvements: (1) A novel initialization strategy that significantly improves Attack Success Rate (ASR). (2) Investigate cross-image trans- ferability by learning universal perturbations. (3) A novel loss function targeting vision encoder attention mechanisms to improve generalization. Our evaluation across prominent VLMs—including Flamingo, BLIP-2, and InstructBLIP as well as extended experiments on LLaVA validates the original results and demonstrates that our improvements consistently boost adversarial effectiveness. Our work reinforces the importance of studying adversarial vulnerabilities in VLMs and provides a more robust framework for generating transferable adversarial examples, with significant implications for understanding the security of VLMs in real-world applications.
@article{swain2025revisiting, title = {Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language Models}, author = {Mittal, Atharv and Pandey, Agam and Swain, Swadesh and Tiwari, Amritanshu and Jindal, Sukrit}, journal = {Transactions on Machine Learning Research}, year = {2025}, month = jun, openreview = {https://openreview.net/forum?id=5L90cl0xtf&noteId=7yskfQI5KW}, }