Search papers, labs, and topics across Lattice.
Xi'an Jiaotong-Liverpool University
4
0
6
Multimodal fusion can actually *hurt* performance in medical imaging tasks like predicting visual field loss, unless you carefully address data imbalance and modality conflict.
Bitstream-corrupted video restoration remains a significant challenge, even with recent advances, as revealed by the NTIRE 2026 challenge results.
Adaptive expert collaboration and holistic token learning can significantly boost performance in fine-grained multimodal visual analytics tasks like driver action recognition.
LVLMs can get a 16.5%-29.5% boost on visual reasoning benchmarks just by iteratively checking their work against the image, no training needed.