Search papers, labs, and topics across Lattice.
MedSteer is introduced as a training-free activation-steering framework for endoscopic image synthesis using diffusion models, addressing the limitations of text prompting and inversion-based editing. It identifies pathology vectors in cross-attention layers based on contrastive prompt pairs and steers image activations along these vectors to generate counterfactual image pairs. Experiments on Kvasir v3 and HyperKvasir demonstrate that MedSteer achieves superior concept flip rates and structural preservation compared to inversion-based methods, and improves downstream polyp detection when used for data augmentation.
Forget re-prompting or inversion: MedSteer lets you surgically edit endoscopic images by steering diffusion model activations, creating perfectly matched counterfactuals with 95% concept flip rates.
Generative diffusion models are increasingly used for medical imaging data augmentation, but text prompting cannot produce causal training data. Re-prompting rerolls the entire generation trajectory, altering anatomy, texture, and background. Inversion-based editing methods introduce reconstruction error that causes structural drift. We propose MedSteer, a training-free activation-steering framework for endoscopic synthesis. MedSteer identifies a pathology vector for each contrastive prompt pair in the cross-attention layers of a diffusion transformer. At inference time, it steers image activations along this vector, generating counterfactual pairs from scratch where the only difference is the steered concept. All other structure is preserved by construction. We evaluate MedSteer across three experiments on Kvasir v3 and HyperKvasir. On counterfactual generation across three clinical concept pairs, MedSteer achieves flip rates of 0.800, 0.925, and 0.950, outperforming the best inversion-based baseline in both concept flip rate and structural preservation. On dye disentanglement, MedSteer achieves 75% dye removal against 20% (PnP) and 10% (h-Edit). On downstream polyp detection, augmenting with MedSteer counterfactual pairs achieves ViT AUC of 0.9755 versus 0.9083 for quantity-matched re-prompting, confirming that counterfactual structure drives the gain. Code is at link https://github.com/phamtrongthang123/medsteer