Search papers, labs, and topics across Lattice.
UC Santa Cruz
13
0
13
LLM-derived user profiles can be powerfully leveraged for recommendation via a surprisingly simple distribution shaping approach, outperforming more complex fusion methods.
LLMs struggle with structured 2D tasks when inputs are serialized into 1D, revealing a surprising performance gap compared to vision-augmented models that directly process the 2D layout.
By intelligently pruning tokens based on spike timing and activation, Vision SmolMamba achieves state-of-the-art efficiency in spiking neural networks, outperforming even Spiking Mamba.
LLMs can denoise sequential recommendations by disagreeing with the recommendation model itself, leading to more robust performance against noisy user data.
Sequence recommendation models can achieve near-perfect scaling efficiency in distributed training, slashing wasted GPU cycles by up to 90%.
Forget fine-tuning behemoth LLMs for every new task – this paper shows how a tiny, nimble model generating smart supplements can unlock surprisingly strong agentic performance from frozen giants.
LLM agents can reliably infer each other's "warmth" and "competence" from interaction histories, leading to significantly better coordination in complex multi-agent settings.
LLM-based ASR can be shrunk to 2.3B parameters and still beat larger models in real-world scenarios by carefully delineating encoder and LLM roles and using a multi-stage training approach.
TLoRA achieves superior performance across multiple tasks while cutting down trainable parameters, redefining efficiency in fine-tuning large language models.
Dramatically improve short-video search for niche content by unifying memorization and generalization with a lightweight semantic ID framework that boosts long-play rates by +0.664%.
Kuaishou's new Dual-Rerank system slashes latency and boosts user engagement by fusing the best of autoregressive and non-autoregressive generative reranking, proving you can have your cake and eat it too in billion-scale search.
VLMs suffer from "digital agnosia," exhibiting a surprisingly sharp failure to transcribe even small color grids into matrices, revealing a critical gap between visual feature encoding and language generation.
Real-time video generation gets a boost: Salt achieves sharper, more dynamic videos at extremely low inference budgets by explicitly enforcing consistency across denoising steps.