Mar 7, 2026arXiv:2603.07066

MedSteer: Counterfactual Endoscopic Synthesis via Training-Free Activation Steering

Trong-Thang Pham, Loc Nguyen, Anh Nguyen, Hien Nguyen, Ngan Le

AI Summary

MedSteer is introduced as a training-free activation-steering framework for endoscopic image synthesis using diffusion models, addressing the limitations of text prompting and inversion-based editing. It identifies pathology vectors in cross-attention layers based on contrastive prompt pairs and steers image activations along these vectors to generate counterfactual image pairs. Experiments on Kvasir v3 and HyperKvasir demonstrate that MedSteer achieves superior concept flip rates and structural preservation compared to inversion-based methods, and improves downstream polyp detection when used for data augmentation.

Key Contribution

Forget re-prompting or inversion: MedSteer lets you surgically edit endoscopic images by steering diffusion model activations, creating perfectly matched counterfactuals with 95% concept flip rates.

Abstract

Generative diffusion models are increasingly used for medical imaging data augmentation, but text prompting cannot produce causal training data. Re-prompting rerolls the entire generation trajectory, altering anatomy, texture, and background. Inversion-based editing methods introduce reconstruction error that causes structural drift. We propose MedSteer, a training-free activation-steering framework for endoscopic synthesis. MedSteer identifies a pathology vector for each contrastive prompt pair in the cross-attention layers of a diffusion transformer. At inference time, it steers image activations along this vector, generating counterfactual pairs from scratch where the only difference is the steered concept. All other structure is preserved by construction. We evaluate MedSteer across three experiments on Kvasir v3 and HyperKvasir. On counterfactual generation across three clinical concept pairs, MedSteer achieves flip rates of 0.800, 0.925, and 0.950, outperforming the best inversion-based baseline in both concept flip rate and structural preservation. On dye disentanglement, MedSteer achieves 75% dye removal against 20% (PnP) and 10% (h-Edit). On downstream polyp detection, augmenting with MedSteer counterfactual pairs achieves ViT AUC of 0.9755 versus 0.9083 for quantity-matched re-prompting, confirming that counterfactual structure drives the gain. Code is at link https://github.com/phamtrongthang123/medsteer

Computer Vision Data Curation & Synthetic Data Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References35

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MedSteer: Counterfactual Endoscopic Synthesis via Training-Free Activation Steering

Related Papers