Search papers, labs, and topics across Lattice.
3
0
7
17
OmniJigsaw reveals a "bi-modal shortcut phenomenon" in joint audio-visual integration, demonstrating that naive fusion can be surprisingly ineffective and highlighting the importance of carefully designed cross-modal training strategies.
Doubling the number of tokens in a ViT-based autoencoder, combined with staged compression and self-supervised pretraining, dramatically improves generative performance under deep compression, without increasing the latent budget.
Diffusion language models can now efficiently self-evaluate their output quality by regenerating their own sequences, enabling more reliable uncertainty quantification and flexible-length generation.