Search papers, labs, and topics across Lattice.
Xiamen University
3
0
5
PAR3D reveals that integrating part-aware representations can dramatically enhance 3D scene understanding, outperforming traditional object-centric models.
Multimodal models stumble badly on low-resource Southeast Asian languages, as revealed by the new SEA-Vision benchmark for document and scene text understanding.
Achieve state-of-the-art zero-shot camouflaged object segmentation by intelligently combining visual features, SAM, and MLLMs to overcome the limitations of relying solely on MLLMs for object discovery.