Search papers, labs, and topics across Lattice.
UIUC ‹ FDU ♡
8
0
12
Forget hand-crafted prompts: Ctx2Skill lets language models bootstrap their own skills from context, learning to reason better without any human labels.
Multi-frame monocular scene flow estimation gets a serious boost with RAFT-MSF++, which uses Geometry-Motion Feature fusion to achieve state-of-the-art results and improved robustness to occlusions.
Synthesizing realistic anomaly images for industrial assembly is now possible thanks to a diffusion model that respects component pose and assembly relationships.
Video-LLMs can be sped up by nearly 3x without sacrificing performance, simply by loosening the strict matching requirements of speculative decoding and focusing on visual-semantic relevance.
Visual token dominance is the hidden culprit behind LVLM inference inefficiency, and this paper dissects the problem to reveal how to navigate the fidelity-efficiency tradeoff.
RLVR models exhibit "Early Correctness Coherence" under noisy supervision, suggesting a surprising opportunity for self-correction via dynamic label refinement.
Robots can now better navigate using language instructions even when objects block their view, thanks to a new method that reasons about the environment in a bird's-eye view rather than relying on visible pixels.
Achieve 95% recovery success in robotic manufacturing by giving vision-language models a persistent, queryable memory of the world.