Search papers, labs, and topics across Lattice.
The paper introduces Self-Supervised Semantic Bridge (SSB), a framework for unpaired image-to-image translation that leverages self-supervised visual encoders to create a shared latent space capturing geometric structure, thus avoiding the need for target-domain adversarial loss or direct inversion. This approach addresses limitations of adversarial and diffusion-inversion methods by conditioning diffusion bridges on appearance-invariant semantic representations. Experiments demonstrate that SSB achieves superior performance in medical image synthesis, including out-of-domain generalization, and enables high-quality text-guided editing.
Achieve spatially faithful image-to-image translation without cross-domain supervision by bridging diffusion models with self-supervised semantic representations.
Adversarial diffusion and diffusion-inversion methods have advanced unpaired image-to-image translation, but each faces key limitations. Adversarial approaches require target-domain adversarial loss during training, which can limit generalization to unseen data, while diffusion-inversion methods often produce low-fidelity translations due to imperfect inversion into noise-latent representations. In this work, we propose the Self-Supervised Semantic Bridge (SSB), a versatile framework that integrates external semantic priors into diffusion bridge models to enable spatially faithful translation without cross-domain supervision. Our key idea is to leverage self-supervised visual encoders to learn representations that are invariant to appearance changes but capture geometric structure, forming a shared latent space that conditions the diffusion bridges. Extensive experiments show that SSB outperforms strong prior methods for challenging medical image synthesis in both in-domain and out-of-domain settings, and extends easily to high-quality text-guided editing.