Search papers, labs, and topics across Lattice.
7
0
11
19
MemDreamer narrows the performance gap with human experts to just 3.7 points while slashing the reasoning context window to a mere 2% of full video ingestion.
Current MLLMs struggle with fine-grained spatial reasoning, achieving only 37.2 F1 on challenging tasks compared to human performance of 84.0 F1.
Foundation models struggle with spatial tasks, achieving only 12% success in reproducing target viewpoints, but a novel post-training framework boosts performance to over 51%.
Achieve superior camera control and visual fidelity in video re-rendering by training on unpaired real and synthetic data with a novel metric geometry reward.
OmniJigsaw reveals a "bi-modal shortcut phenomenon" in joint audio-visual integration, demonstrating that naive fusion can be surprisingly ineffective and highlighting the importance of carefully designed cross-modal training strategies.
Doubling the number of tokens in a ViT-based autoencoder, combined with staged compression and self-supervised pretraining, dramatically improves generative performance under deep compression, without increasing the latent budget.
Diffusion language models can now efficiently self-evaluate their output quality by regenerating their own sequences, enabling more reliable uncertainty quantification and flexible-length generation.