Search papers, labs, and topics across Lattice.
Forget finetuning: DynaEdit unlocks complex video edits like action modification and object insertion, all without training, using clever manipulation of pretrained text-to-video models.
Reconstructing humans and their environments from multi-view video can now be done in a single pass, and 8x faster, without needing extra modules or preprocessing.
Forget fine-tuning: surprisingly, single neuron activations in VLMs can be directly probed to create classifiers that outperform the full model, with 5x speedups.
Multimodal web agents are surprisingly vulnerable to cross-modal attacks, but a novel adversarial training approach can double task completion efficiency while mitigating these risks.
DINOv2's impressive unimodal performance doesn't translate to cross-modal understanding, but a simple training tweak can align embeddings across RGB, depth, and segmentation without sacrificing feature quality.
Achieve state-of-the-art ECG analysis by disentangling modality-specific biases and capturing spatial-temporal dependencies, outperforming existing multimodal approaches.
Existing deforestation monitoring maps misclassify smallholder agroforestry as "forest," risking unfair penalties under regulations like the EUDR.
Forget textual descriptions – this zero-shot image retrieval method hallucinates the target image directly, outperforming the state-of-the-art by creating a whole synthetic world to match against.
Ditch the high-fidelity simulator: IRL-VLA uses a lightweight reward world model trained with inverse reinforcement learning to enable efficient and effective closed-loop RL training for autonomous driving.
LVLMs struggle to navigate cultural nuances, with even the best models achieving only 62% awareness and 38% compliance on a new benchmark spanning 16 countries.