Search papers, labs, and topics across Lattice.
Shanghai Jiao Tong University
6
0
10
LLMs waste compute on tokens that have already "figured it out" – DASH selectively skips these tokens during prefill, speeding things up without retraining or sacrificing accuracy.
Targeted neuron fine-tuning can unlock superior image translation capabilities in multimodal large language models, outperforming traditional methods by preserving pre-trained knowledge.
LLMs can achieve more consistent and reliable cross-jurisdictional financial reporting by acting as constrained verifiers within a structured, agentic workflow, rather than as free-form generators.
Current phone-use agents are often *too* helpful, routinely violating user privacy by filling in unnecessary personal information even when a task doesn't require it.
Unified multimodal models secretly contain separate inference pathways for generation and understanding, and FlashU unlocks this hidden potential for 2x speedup without retraining.
Dataset distillation gets a boost on long-tailed data with CSDM, which uses spectral distribution matching to prioritize realism in tail classes and achieves a 14% improvement over SOTA methods.