Search papers, labs, and topics across Lattice.
2
0
3
2
Ditch the complex multimodal pre-training pipelines: GenLIP proves a simple language modeling objective can effectively align vision encoders with LLMs, achieving strong performance with less data.
Forget text-centric pipelines: FlowInOne achieves SOTA multimodal generation by unifying text, layouts, and instructions into a single visual flow, outperforming both open-source and commercial systems.