Search papers, labs, and topics across Lattice.
7
0
9
PTL-Diffusion achieves superior manifold-level distributional matching by embedding phase structure directly into the diffusion process, outperforming traditional models.
MOSS-Audio achieves state-of-the-art performance in audio understanding tasks by effectively integrating temporal cues and deep acoustic features, setting a new benchmark for audio-language models.
Training your radio map estimation model on i.i.d. data? Prepare for a rude awakening when deploying on UAVs, where trajectory-based sampling can tank performance.
Unlock "white-box" reasoning in vision-language models: SegCompass's sparse autoencoder creates an interpretable bridge between visual perception and chain-of-thought, outperforming black-box alignment methods.
Visual token dominance is the hidden culprit behind LVLM inference inefficiency, and this paper dissects the problem to reveal how to navigate the fidelity-efficiency tradeoff.
Zero-shot RL agents can now learn better representations by focusing on dynamics-relevant image regions, leading to state-of-the-art generalization performance.
MLLMs can achieve up to 7.9x KV cache compression and 1.52x faster decoding without sacrificing performance by intelligently compressing different attention heads with distinct strategies.