Search papers, labs, and topics across Lattice.
Shanghai AI Laboratory
7
0
12
EventVLA's foresight-driven memory mechanism boosts robotic manipulation success rates by 40% by dynamically capturing critical visual events before they become unobservable.
Sparse prefilling can dramatically accelerate long-context inference in diffusion language models, achieving up to 28x speedup without sacrificing quality.
LLMs are still far from being able to generate expert-level clinical guidelines, despite advances in deep research systems.
Attention Sink, where Transformers fixate on seemingly irrelevant tokens, is more than just a quirk – it's a fundamental challenge impacting training, inference, and even causing hallucinations, demanding a systematic approach to understanding and mitigating its effects.
Achieve better compression in low-bit quantization by considering not just numerical sensitivity, but also the structural role of each layer.
Text-based speculative decoding falls flat for vision-language models, but ViSkip dynamically adapts to vision tokens for state-of-the-art acceleration.
LLMs can reason better if you force them to explore *different* ways of being right, not just be more random.