Search papers, labs, and topics across Lattice.
5
0
9
BabelRS disentangles modality alignment from downstream task learning in multi-modal remote sensing, leading to more stable training and improved detection accuracy.
By aligning attention patterns between intact and corrupted image processing paths, CrystaL crystallizes task-relevant visual semantics in MLLM latent spaces without needing extra annotations.
Steer frozen MLLMs to reason about specific image regions at test time, without any training, by optimizing visual prompts that guide cross-modal attention.
Expert-annotated geographic reasoning data and specialized rewards unlock superior geolocation performance in RL agents, outperforming even large VLLMs.
Forget incomplete video descriptions: a new dataset and training pipeline yields video captioning models that are competitive with Gemini-3-Pro while reducing hallucinations.