Search papers, labs, and topics across Lattice.
Peking University
2
0
3
Multi-modal models fail to maintain coherent memory across video streams, revealing fundamental weaknesses in their memory architectures.
Visual inputs can hijack the moral compass of VLMs, causing them to abandon carefully tuned text-based safety protocols and make surprisingly unethical decisions.