Search papers, labs, and topics across Lattice.
3
0
3
4
LMMs can learn to generate images *and* improve their understanding abilities, without catastrophic forgetting, by carefully disentangling and sharing experts within a MoE architecture.
Achieve SOTA in both visual generation and understanding by harmonizing generative and semantic representations within a single ViT architecture.
Ditch discrete visual tokens: UniCom achieves SOTA multimodal generation by compressing continuous semantic representations, unlocking better controllability and consistency in image editing.