Search papers, labs, and topics across Lattice.
This paper explores the use of diffusion model representations as an auxiliary learning signal during adversarial training (AT) to improve robustness of image classifiers. They find that diffusion models offer diverse and partially robust representations, and that incorporating these representations during AT consistently improves robustness across various datasets. Their analysis suggests that this approach encourages more disentangled features, with diffusion representations and generated data playing complementary roles.
Diffusion models aren't just good for generating synthetic data for robust training; their internal representations can be directly leveraged to boost adversarial robustness and disentangle features.
Incorporating diffusion-generated synthetic data into adversarial training (AT) has been shown to substantially improve the training of robust image classifiers. In this work, we extend the role of diffusion models beyond merely generating synthetic data, examining whether their internal representations, which encode meaningful features of the data, can provide additional benefits for robust classifier training. Through systematic experiments, we show that diffusion models offer representations that are both diverse and partially robust, and that explicitly incorporating diffusion representations as an auxiliary learning signal during AT consistently improves robustness across settings. Furthermore, our representation analysis indicates that incorporating diffusion models into AT encourages more disentangled features, while diffusion representations and diffusion-generated synthetic data play complementary roles in shaping representations. Experiments on CIFAR-10, CIFAR-100, and ImageNet validate these findings, demonstrating the effectiveness of jointly leveraging diffusion representations and synthetic data within AT.