Stanford HAIFeb 18, 2026arXiv:2602.16664

Unpaired Image-to-Image Translation via a Self-Supervised Semantic Bridge

Jiaming Liu, Jiaming Liu, Felix Petersen, Felix Petersen, Yunhe Gao, Yunhe Gao, Yabin Zhang, Yabin Zhang, Hyojin Kim, Akshay S. Chaudhari, Akshay S. Chaudhari, Yu Sun, Stefano Ermon, Stefano Ermon, Sergios Gatidis, S. Gatidis

AI Summary

The paper introduces Self-Supervised Semantic Bridge (SSB), a framework for unpaired image-to-image translation that leverages self-supervised visual encoders to create a shared latent space capturing geometric structure, thus avoiding the need for target-domain adversarial loss or direct inversion. This approach addresses limitations of adversarial and diffusion-inversion methods by conditioning diffusion bridges on appearance-invariant semantic representations. Experiments demonstrate that SSB achieves superior performance in medical image synthesis, including out-of-domain generalization, and enables high-quality text-guided editing.

Key Contribution

Achieve spatially faithful image-to-image translation without cross-domain supervision by bridging diffusion models with self-supervised semantic representations.

Abstract

Adversarial diffusion and diffusion-inversion methods have advanced unpaired image-to-image translation, but each faces key limitations. Adversarial approaches require target-domain adversarial loss during training, which can limit generalization to unseen data, while diffusion-inversion methods often produce low-fidelity translations due to imperfect inversion into noise-latent representations. In this work, we propose the Self-Supervised Semantic Bridge (SSB), a versatile framework that integrates external semantic priors into diffusion bridge models to enable spatially faithful translation without cross-domain supervision. Our key idea is to leverage self-supervised visual encoders to learn representations that are invariant to appearance changes but capture geometric structure, forming a shared latent space that conditions the diffusion bridges. Extensive experiments show that SSB outperforms strong prior methods for challenging medical image synthesis in both in-domain and out-of-domain settings, and extends easily to high-quality text-guided editing.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References100

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Unpaired Image-to-Image Translation via a Self-Supervised Semantic Bridge

Related Papers