Search papers, labs, and topics across Lattice.
2
0
5
LLaVA-OV-2's codec-stream tokenization lets it crush existing video-language models, especially in tasks requiring fine-grained temporal understanding of high-frequency motion.
RL fine-tuning LMMs for tool use can collapse structural formats due to strong pretrained tool priors, but a surprisingly simple fix of targeted format rewards and frame-budget randomization can restore stability and boost performance.