Search papers, labs, and topics across Lattice.
Peking University
2
0
4
Multi-modal models fail to maintain coherent memory across video streams, revealing fundamental weaknesses in their memory architectures.
MLLMs can now detect tampered text in images with significantly improved accuracy and reduced reliance on expensive annotations, thanks to a novel reinforcement learning approach.