Search papers, labs, and topics across Lattice.
1
0
3
2
MLLMs stumble badly when asked to reason about safety in lab settings, dropping 32% in performance compared to general knowledge, revealing a critical gap for real-world deployment.