Search papers, labs, and topics across Lattice.
The Ohio State University
5
0
11
DLA adapts memory management in linear attention, preserving crucial information while reducing error accumulation over long sequences.
LLM agents struggle to generalize from experience to reusable skills, often performing worse than simply replaying past trajectories, revealing a critical gap in current abstraction methods.
Attention Sink, where Transformers fixate on seemingly irrelevant tokens, is more than just a quirk – it's a fundamental challenge impacting training, inference, and even causing hallucinations, demanding a systematic approach to understanding and mitigating its effects.
Text-based speculative decoding falls flat for vision-language models, but ViSkip dynamically adapts to vision tokens for state-of-the-art acceleration.
LLMs can reason better if you force them to explore *different* ways of being right, not just be more random.