Search papers, labs, and topics across Lattice.
7
0
11
0
Forget finetuning encoders: representing human motion as structured text unlocks surprisingly strong performance on motion understanding tasks by directly leveraging LLMs' pretrained knowledge.
Ditch the noise: FAVE achieves 10x faster sequential recommendations by learning a direct, one-step trajectory from user history to predicted item, bypassing the inefficient "noise-to-data" paradigm.
VLMs struggle to align assembly diagrams and videos because they occupy disjoint visual representation spaces, revealing a fundamental limitation in cross-modal understanding.
Shrinking a 2B vision-language retriever to a 70M text-only model achieves 95% of the original quality and outperforms a 2B baseline, while slashing query latency by 50x.
Ditch global embeddings for text-motion retrieval: this method uses joint-angle motion images and token-patch late interaction to achieve state-of-the-art accuracy and interpretability.
Spotting coordinated fake reviewers just got easier: a new graph learning method boosts detection accuracy by adaptively weighing network diversity and similarity.
Prompt leakage attacks on multi-tenant LLMs are far more efficient than previously thought: a new RL-based method reconstructs prompts with over 12x fewer requests.