Search papers, labs, and topics across Lattice.
2
0
5
Multimodal models forget how to see and reason after SFT, but PRISM realigns them before RL, boosting performance by up to 6%.
Today's visual generation models excel at photorealism but still fail at the kind of spatial reasoning, long-term consistency, and causal understanding that truly intelligent visual generation demands.