Search papers, labs, and topics across Lattice.
10
0
11
Fixed guidance weights in diffusion models are suboptimal: C$^2$FG offers a training-free, theoretically grounded approach to dynamically adjust guidance strength, improving performance across diverse generative tasks.
Unified multimodal models often *hurt* performance on multimodal understanding tasks, except for spatial reasoning, visual illusions, and multi-round reasoning, challenging the assumption that generation universally improves understanding.
Forget same-family constraints: you can compress prompts for LLaMA with a Qwen draft model and still get 90-100% of the original performance.
Knowledge Graph Completion gets a boost: KGT's dedicated entity tokens and decoupled prediction heads let LLMs reason about KGs without being constrained by token fragmentation.
Image-to-video models can now generate more consistent and expressive animated storyboards than static diffusion models, thanks to a Disney-inspired multi-agent framework.
A 1000x larger video reasoning dataset reveals early signs of emergent generalization, offering a new foundation for training and evaluating spatiotemporal AI.
ReAct agents can now automate feature engineering, outperforming existing methods on tabular data tasks by iteratively discovering and evaluating features.
LLMs can't reliably debug code in long contexts (64k-128k tokens) even with perfect information retrieval, despite impressive performance in agentic workflows that decompose the task.
Current multimodal agents are surprisingly bad at web browsing, achieving only 36% accuracy on a new benchmark designed to test deep, multi-modal reasoning across web pages.
Turns out, skipping the boring parts of a video (like static backgrounds) makes your vision AI both faster and smarter, beating state-of-the-art models with less data.