Search papers, labs, and topics across Lattice.
2
0
5
Chain-of-thought reasoning in multimodal LLMs falls apart when faced with slightly different scenarios, but surprisingly, mixing text formats in the reasoning process helps them generalize better than using images.
LVLMs can be subtly backdoored with manipulated images, allowing attackers to inject targeted messages into multi-turn conversations and manipulate users after a specific trigger.