Search papers, labs, and topics across Lattice.
City University of Hong Kong, Baidu Inc
2
0
5
A lightweight architecture that distills long textual sequences using visual tokens as dynamic queries boosts LLM performance on 2D table understanding by 23.9%.
Multimodal embeddings get a serious upgrade with CoCoA, a new pre-training method that forces models to compress all input information into a single token for reconstruction, leading to substantial quality gains.