Search papers, labs, and topics across Lattice.
Ant Group
2
0
5
LVLMs can achieve SOTA visual reasoning by learning to "see" in a way that optimizes for reasoning, even if it means deviating from strict geometric accuracy.
Doubling the number of tokens in a ViT-based autoencoder, combined with staged compression and self-supervised pretraining, dramatically improves generative performance under deep compression, without increasing the latent budget.