Search papers, labs, and topics across Lattice.
3
0
4
Unlock compositional reasoning in 3D vision-language models with a new dataset that maps noun phrases to 3D instances, revealing improvements on both fine-grained and traditional segmentation tasks.
Steer frozen MLLMs to reason about specific image regions at test time, without any training, by optimizing visual prompts that guide cross-modal attention.
Achieve state-of-the-art zero-shot camouflaged object segmentation by intelligently combining visual features, SAM, and MLLMs to overcome the limitations of relying solely on MLLMs for object discovery.