Search papers, labs, and topics across Lattice.
Xiamen University
2
0
5
Current Omni-modal LLMs can ace perception tasks but still fail at basic social interactions like knowing when and how to jump into a conversation.
By aligning attention patterns between intact and corrupted image processing paths, CrystaL crystallizes task-relevant visual semantics in MLLM latent spaces without needing extra annotations.