Search papers, labs, and topics across Lattice.
4
0
9
0
Skip the costly full training runs: this new metric accurately predicts face recognition dataset quality using only lightweight proxy models.
Uniformly quantizing the entire diffusion action head of VLAs to W4A4 is not only possible, but can match or exceed FP16 performance, defying conventional wisdom and slashing memory footprint by 71%.
LLaVA-OV-2's codec-stream tokenization lets it crush existing video-language models, especially in tasks requiring fine-grained temporal understanding of high-frequency motion.
Forget generic retrieval signals – UniDoc-RL uses reinforcement learning to teach LVLMs how to actively perceive and reason about visual information, yielding a 17.7% performance boost.