Search papers, labs, and topics across Lattice.
2
0
5
Multimodal embeddings get a serious upgrade with CoCoA, a new pre-training method that forces models to compress all input information into a single token for reconstruction, leading to substantial quality gains.
By explicitly modeling event-level causal relationships in videos via scene graphs, GraphThinker significantly reduces hallucinations in video reasoning tasks.