Search papers, labs, and topics across Lattice.
2
0
5
Doubling the number of tokens in a ViT-based autoencoder, combined with staged compression and self-supervised pretraining, dramatically improves generative performance under deep compression, without increasing the latent budget.
Forget training wheels: DeepScan unlocks significant gains in LVLM visual reasoning *without* any additional training, achieving state-of-the-art results on V*.