Search papers, labs, and topics across Lattice.
4
0
8
Achieving lossless processing of 256K contexts, Keye-VL-2.0 transforms how we approach long-video understanding and agentic intelligence.
Personality induction boosts image captioning but can hinder reasoning tasks, revealing a complex interplay in MLLM behavior that demands tailored approaches.
Disentangling perception and reasoning with role-specific rewards in multimodal LLMs boosts accuracy by 7 points, revealing a critical bottleneck in existing joint optimization approaches.
Get better image captions without more data: reinforcement learning can train vision-language models to focus on image details by maximizing the similarity between images retrieved using the generated captions.