Search papers, labs, and topics across Lattice.
This paper revisits autoregressive (AR) models for generative image classification, identifying the fixed token order of prior AR classifiers as a key limitation. They propose an order-marginalized AR classifier using any-order AR models to average predictions across multiple token orders, providing a more comprehensive signal for classification. The resulting AR classifier outperforms diffusion-based classifiers by a significant margin (up to 25x more efficient) and achieves competitive performance compared to state-of-the-art self-supervised discriminative models.
Autoregressive generative classifiers can beat diffusion models at image classification, but only if you marginalize over token order.
Class-conditional generative models have emerged as accurate and robust classifiers, with diffusion models demonstrating clear advantages over other visual generative paradigms, including autoregressive (AR) models. In this work, we revisit visual AR-based generative classifiers and identify an important limitation of prior approaches: their reliance on a fixed token order, which imposes a restrictive inductive bias for image understanding. We observe that single-order predictions rely more on partial discriminative cues, while averaging over multiple token orders provides a more comprehensive signal. Based on this insight, we leverage recent any-order AR models to estimate order-marginalized predictions, unlocking the high classification potential of AR models. Our approach consistently outperforms diffusion-based classifiers across diverse image classification benchmarks, while being up to 25x more efficient. Compared to state-of-the-art self-supervised discriminative models, our method delivers competitive classification performance - a notable achievement for generative classifiers.