Search papers, labs, and topics across Lattice.
22
0
18
6
Forget training wheels: AnomalyAgent uses the reasoning power of multimodal LLMs to spot anomalies in zero- or few-shot settings, outperforming traditional VLM approaches.
VLA models may excel at visually grounded tasks, but VLA-Trace reveals they still struggle with fine-grained semantic understanding and exhibit distinct modality processing strategies.
Forget hand-crafting mobile benchmarks – PhoneWorld lets you automatically generate them from real-world GUI trajectories, leading to massive performance gains for phone-use agents.
Current MLLM agents struggle to find GUI defects, but a new benchmark and evaluator reveals the critical bottleneck is detection, and surprisingly, simply integrating the evaluator's verifiers significantly boosts performance without retraining.
Bridge the gap between offline model scaling and online deployment in recommendation systems: Rec-Distill enables lightweight student models to capture a substantial portion of the performance gains from massive teacher models.
Achieve state-of-the-art talking face generation without any fine-tuning, proving that pre-trained diffusion models like Stable Diffusion already possess strong lip-related semantics.
LLM agents trained with simulated user and tool noise not only become more robust in messy real-world environments, but also surprisingly improve on clean, idealized benchmarks.
Current LLM agents still struggle to infer and leverage user preferences from fragmented, real-world interactions, revealing a substantial gap between their capabilities and the demands of personalized decision-making.
LLMs struggle with structured 2D tasks when inputs are serialized into 1D, revealing a surprising performance gap compared to vision-augmented models that directly process the 2D layout.
LLM-derived user profiles can be powerfully leveraged for recommendation via a surprisingly simple distribution shaping approach, outperforming more complex fusion methods.
By intelligently pruning tokens based on spike timing and activation, Vision SmolMamba achieves state-of-the-art efficiency in spiking neural networks, outperforming even Spiking Mamba.
LLMs can denoise sequential recommendations by disagreeing with the recommendation model itself, leading to more robust performance against noisy user data.
Sequence recommendation models can achieve near-perfect scaling efficiency in distributed training, slashing wasted GPU cycles by up to 90%.
Forget fine-tuning behemoth LLMs for every new task – this paper shows how a tiny, nimble model generating smart supplements can unlock surprisingly strong agentic performance from frozen giants.
LLM agents can reliably infer each other's "warmth" and "competence" from interaction histories, leading to significantly better coordination in complex multi-agent settings.
TLoRA achieves superior performance across multiple tasks while cutting down trainable parameters, redefining efficiency in fine-tuning large language models.
LLM-based ASR can be shrunk to 2.3B parameters and still beat larger models in real-world scenarios by carefully delineating encoder and LLM roles and using a multi-stage training approach.
Dramatically improve short-video search for niche content by unifying memorization and generalization with a lightweight semantic ID framework that boosts long-play rates by +0.664%.
Kuaishou's new Dual-Rerank system slashes latency and boosts user engagement by fusing the best of autoregressive and non-autoregressive generative reranking, proving you can have your cake and eat it too in billion-scale search.
VLMs suffer from "digital agnosia," exhibiting a surprisingly sharp failure to transcribe even small color grids into matrices, revealing a critical gap between visual feature encoding and language generation.
Real-time video generation gets a boost: Salt achieves sharper, more dynamic videos at extremely low inference budgets by explicitly enforcing consistency across denoising steps.
Most "agent skills" hyped for boosting LLMs in software engineering provide almost no benefit in real-world tasks, with 80% yielding zero pass-rate improvement.