Search papers, labs, and topics across Lattice.
Jilin University, Shanghai Jiao Tong University, University of California at Merced
5
0
4
Current LLMs struggle to effectively manage memory in multimodal, multi-participant settings, revealing critical gaps in their design.
MLLMs can revolutionize video understanding by integrating watching, remembering, and reasoning into a cohesive framework that addresses long-range dependencies and sparse evidence.
UniSHARP achieves unprecedented photorealistic view synthesis across a continuum of camera systems, outperforming traditional methods by a substantial margin.
Treating geometry as a fundamental representational prerequisite, rather than a late-fusion auxiliary signal, significantly boosts spatio-temporal reasoning in vision-language models.
Motion-controlled video generation can now produce more plausible and natural results by reasoning about motion and its consequences, rather than rigidly following user-defined trajectories.