Search papers, labs, and topics across Lattice.
3
0
6
16
Achieve superior audio-visual generation with a 6.3B parameter model by disentangling alignment and generation, outperforming larger models.
Multimodal models can better understand degraded images by dropping pixel-perfect reconstruction and directly optimizing the latent space for reasoning, leading to higher perceptual quality.
Forget scaling depth and width鈥擬OUE unlocks a new "virtual width" dimension for Mixture-of-Experts by cleverly reusing a single expert pool across layers.