Search papers, labs, and topics across Lattice.
3
0
6
4
Surprisingly, high sparsity in video diffusion models doesn't degrade generation quality if the sparse mask accurately mimics the tile-wise geometry of full attention.
RL fine-tuning LMMs for tool use can collapse structural formats due to strong pretrained tool priors, but a surprisingly simple fix of targeted format rewards and frame-budget randomization can restore stability and boost performance.
Today's visual generation models are often evaluated on the wrong things, leading to inflated performance claims that mask critical failures in spatial reasoning, temporal consistency, and causal understanding.