Search papers, labs, and topics across Lattice.
3
0
6
Don't fully retrain your draft model after fine-tuning your LLM: EDA restores speculative decoding performance with significantly less compute by adapting only a small, private component and regenerating training data.
Existing multimodal models struggle with multi-image reasoning, but a new benchmark and inference-time attention fix exposes and alleviates these shortcomings.
Steer frozen MLLMs to reason about specific image regions at test time, without any training, by optimizing visual prompts that guide cross-modal attention.