Search papers, labs, and topics across Lattice.
Human-inspired context sensitivity boosts visual reasoning in machines, closing the gap between AI and human perception.
LLMs' struggle to grasp subtext—even generating literal clues 60% of the time—reveals a critical gap in their ability to understand nuanced human communication.
Forget finetuning: DynaEdit unlocks complex video edits like action modification and object insertion, all without training, using clever manipulation of pretrained text-to-video models.
Forget fine-tuning: surprisingly, single neuron activations in VLMs can be directly probed to create classifiers that outperform the full model, with 5x speedups.
DINOv2's impressive unimodal performance doesn't translate to cross-modal understanding, but a simple training tweak can align embeddings across RGB, depth, and segmentation without sacrificing feature quality.
Tri-modal masked diffusion models can now be trained from scratch, achieving strong results in text generation, text-to-image, and text-to-speech, thanks to a systematic exploration of the design space and a novel SDE-based batch size reparameterization.
Existing deforestation monitoring maps misclassify smallholder agroforestry as "forest," risking unfair penalties under regulations like the EUDR.
Forget textual descriptions – this zero-shot image retrieval method hallucinates the target image directly, outperforming the state-of-the-art by creating a whole synthetic world to match against.