Search papers, labs, and topics across Lattice.
100 papers published across 4 labs.
LLMs exhibit consistent and detectable geographic preferences for brands and cultures, revealing potential biases in market intermediation that persist across user personas.
Spotify's GLIDE model proves that generative LLMs can drive significant gains in podcast discovery and non-habitual listening in a real-world, production environment.
Ditch static embeddings: Generative retrieval, powered by reinforcement learning, lets models dynamically reason about relevance, outperforming larger contrastively-trained models on reasoning-intensive tasks.
Finding a hidden node in a graph just got a whole lot faster: a new algorithm slashes the average search cost with provable approximation guarantees, even with non-uniform query costs.
Naive fine-tuning of VLMs for multimodal sequential recommendation causes catastrophic modality collapse, but can be fixed with gradient rebalancing and cross-modal regularization.
LLMs exhibit consistent and detectable geographic preferences for brands and cultures, revealing potential biases in market intermediation that persist across user personas.
Spotify's GLIDE model proves that generative LLMs can drive significant gains in podcast discovery and non-habitual listening in a real-world, production environment.
Ditch static embeddings: Generative retrieval, powered by reinforcement learning, lets models dynamically reason about relevance, outperforming larger contrastively-trained models on reasoning-intensive tasks.
Finding a hidden node in a graph just got a whole lot faster: a new algorithm slashes the average search cost with provable approximation guarantees, even with non-uniform query costs.
Naive fine-tuning of VLMs for multimodal sequential recommendation causes catastrophic modality collapse, but can be fixed with gradient rebalancing and cross-modal regularization.
Stop training LLMs to assign arbitrary scores to papers in isolation; comparison-based ranking unlocks significantly better generalization and accuracy in paper evaluation.
Existing citation recommendation benchmarks overestimate real-world performance because they fail to account for the temporal constraints of recommending citations for *new* papers.
Forget tool-augmented systems: NEO shows you can consolidate search, recommendation, and reasoning into a single language-steerable LLM by representing items as SIDs and interleaving them with natural language.
Federated recommendation systems can now better adapt to evolving user preferences without sacrificing privacy, thanks to a novel approach that retains historical knowledge and transfers insights between similar users.
Semantic sorting in LLMs can be twice as fast with no loss in accuracy by strategically combining listwise ranking algorithms.
LLMs forget up to 60% of facts when summarizing and erode over half of project constraints during iterative compaction, but a simple discrete memory system (KOs) fixes this while slashing costs by 252x.
Agentic LLMs are surprisingly vulnerable: a new framework finds successful attacks in 84% of attempts by escalating prompt injection techniques across multiple stages.
Seemingly sophisticated dense retrieval methods can catastrophically fail at contradiction detection due to "Semantic Collapse," highlighting the surprising effectiveness of a simple, decoupled lexical approach for reliable biomedical QA.
LLMs can be systematically shifted from stochastic pattern-matchers to verified truth-seekers using a carefully orchestrated, multi-stage retrieval and verification pipeline.
RAG systems can now achieve 8x better PII leakage protection without sacrificing utility or speed, thanks to a novel "Verify-then-Route" paradigm.
"Superspreader" networks on Twitter amplify contrarian scientific viewpoints, influencing news media coverage and potentially distorting public understanding of science.
LLM-powered recommendation agents, despite their reasoning prowess, are easily manipulated by contextual biases in high-stakes scenarios like paper review and job recruitment.
LLMs armed with RAG can reconstruct cyberattacks with high precision and recall, but the best model for the job depends on your budget: DeepSeek V3 matches Claude Sonnet 4's accuracy at 1/15th the cost.
Forget chasing leaderboard hype: this study reveals that larger embedding models and strategic concatenation are key to unlocking LLM-powered tabular prediction, regardless of public rankings.
No training needed: ARAM dynamically adjusts retrieved context guidance in masked diffusion models based on signal quality, resolving retrieval-prior conflicts on the fly.
Retrieval-augmented LLM agents can learn to learn from experience, achieving significantly better generalization on unseen tasks by combining the strengths of fine-tuning and in-context retrieval.
Discover emergent narratives in real-time without predefined labels, revealing how information evolves during crises.
Stop chasing leaderboard gains on generic benchmarks: PJB reveals that domain-specific weaknesses in person-job retrieval far outweigh the benefits of general model upgrades, and that query understanding modules can actually hurt performance.
LLMs can now recommend drugs with state-of-the-art accuracy by synthesizing individual patient context with the prescribing tendencies of similar cases, outperforming guideline-based and similar-patient retrieval methods.
Forget subjective scouting reports: this framework objectively identifies undervalued football players by blending market dynamics with news sentiment, offering a data-driven edge in talent acquisition.
Forget specialized tools: a standard Unix terminal and clever RL are all you need to beat much larger LLMs at code search.
Forget full finetuning: OPERA's dynamic pruning lets you adapt retrieval models to new domains with better ranking and recall, in half the time.
Temporal CNNs and LSTMs can slash inventory costs and boost fill rates compared to traditional forecasting methods, offering a tangible advantage for supply chain optimization.
Symbolic planning unlocks significant gains in RTL synthesis and summarization, boosting LLM performance by 20% without fine-tuning.
Forget generic code generation – this work shows that structure-aware retrieval of domain-specific examples slashes the debugging needed to get LLMs to produce working scientific visualization pipelines.
Wikipedia editors can now get AI assistance to identify claims needing citations in 10 languages, improving content reliability at scale.
LLMs struggle with questions requiring up-to-date information, especially when the recency requirement is context-dependent, highlighting a critical gap in temporal reasoning.
Achieve personalized generation with cloud-scale reasoning while preserving user privacy, thanks to a novel asymmetric collaboration framework that's also 2x faster.
CRAG's retrieval evaluator surprisingly relies on named entity alignment, not semantic similarity, to judge document quality.
Off-the-shelf foundation models struggle with instance-level visual product search in industrial settings, often falling short compared to domain-specific models.
LSTM-based intrusion detection can achieve 99.42% accuracy in identifying cyber threats within IoT networks, slightly outperforming CNN-based approaches.
By intelligently injecting and removing noise, RaDAR significantly improves recommendation accuracy in sparse and noisy collaborative filtering environments.
Thompson Sampling gets a major upgrade with C3, outperforming existing methods by 12.4% in click-through rate on the Microsoft News Dataset by better handling non-stationary correlated rewards.
Hyperbolic GNNs on Bitcoin transaction networks need careful tuning of learning rate and curvature to stabilize high-dimensional embeddings, a factor often overlooked.
LLMs can dynamically optimize the training curriculum of multimodal retrieval models, leading to significant gains in retrieval accuracy by adapting to the model's evolving state.
Lightweight LLMs like Gemini 2.0 and GPT-3.5 can extract key metadata from cloud incident reports with surprisingly high accuracy (75-95%), offering a cost-effective alternative to larger models.
Achieve 91%+ Hit@1 retrieval accuracy in a local-first long-term memory system for AI assistants by combining vector recall, keyword recall, RRF, and re-ranking, while maintaining sub-90ms search latency at scale.
Generative search engines create "answer bubbles" by selectively citing and framing information, leading to divergent information realities compared to traditional search.
Escape the flatland of traditional recommender systems: RecBundle uses differential geometry to disentangle user interactions from preferences, opening the door to understanding and mitigating systemic biases.
E-commerce search LLMs can be made both more knowledgeable and secure via a surprisingly simple three-stage framework of data synthesis, parameter-efficient pre-training, and dual-path alignment.
Unsupervised detection of adversarial attacks in RAG systems is possible using generator activations and uncertainty measures, even without knowing the target prompt.
By adaptively routing medical image queries to global and local feature experts, HMAR achieves state-of-the-art retrieval accuracy without relying on expensive bounding box annotations.
Counterfactual examples supercharge visual in-context learning, enabling smaller vision-language models to outperform larger ones by focusing on causal relationships rather than superficial correlations.
LLMs struggle to selectively apply user preferences stored in memory, often misapplying them even when social norms dictate otherwise, revealing a critical gap in context-aware personalization.
Synthetic benchmarks can't catch the nuances of personalized deep research, as real users revealed nine critical errors that LLM judges missed entirely.
Restaurant recommendations get a flavor upgrade: ReFORM uses LLMs to distill user preferences and item qualities from reviews, then spotlights the decision factors that truly matter.
Instead of just gathering more context, turn retrieval into a mechanism for actively testing and refining a provisional answer, yielding substantial gains in factual QA accuracy.
Achieve state-of-the-art multi-hop question answering by pre-computing bridging facts at index time, eliminating the need for complex online reasoning or graph traversal.
LLMs can now remember and reason about long-term conversations with significantly improved accuracy thanks to a new temporal-aware memory framework that structures dialogue into event calendars.
LLM agents can now leverage a unified memory framework that dynamically adapts to different question types, enabling more coherent and user-centric long-horizon dialogues.
Conformal factuality for RAG breaks down when faced with distribution shifts or distractors, forcing a trade-off between factuality and informativeness.
You can estimate the completeness of a web crawl using only its own historical data, without needing external datasets.
Small language models can achieve surprisingly robust question answering by actively clustering their memories into semantically coherent groups, outperforming standard retrieval methods.
Imperfect knowledge graphs can lead to retrieval drift and hallucinations in multi-hop reasoning, but C2RAG offers a robust solution that improves EM by 3.4% and F1 by 3.9% over existing methods.
Unlock cross-jurisdictional legal analysis by automatically identifying corresponding legal provisions across national systems using multilingual embeddings and XML schema conversions.
Extracting user profiles from recommendation lists is now more accurate thanks to RAPI, a new framework that leverages BERT embeddings and sample augmentation to boost inference accuracy by dynamically weighting user characteristics.
Forget hand-crafted features: this system uses an LLM to automatically discover features from event sequences that outperform even state-of-the-art embeddings by up to 5.8%.
Users on Xiaohongshu are generally happy with the platform's new translation feature, but their creative use of slang, emoji, and coded language highlights the challenges of real-world machine translation.
Combining multiple embedding models and looking for consensus flags just 1% of network records as anomalous, but flags *only* synthetic attacks, enabling security teams to focus on the needle in the haystack.
Reinforcement learning unlocks fast, high-quality consensus ranking aggregation, outperforming classical heuristics and ILP solvers for the NP-hard Kemeny optimization problem.
Reinforcement learning can now handle active feature selection in high-dimensional datasets by intelligently pruning the feature search space and regularizing decision sequences, outperforming existing methods in accuracy and policy complexity.
Privacy-preserving RAG gets a massive speed boost (3-300x) by ditching secure sorting for an interactive bisection method that also supports arbitrary top-$k$ retrieval.
LLMs can automate up to 90% of radiology report annotations with high accuracy, slashing expert review time.
Even GPT-4 struggles with long-term preference capture in e-commerce, but a lightweight, jointly-trained LLM agent can beat it.
Forget complex LLM-based structuring: simple, deterministic retrieval with smart ranking beats state-of-the-art conversational memory systems while using 8.5x fewer tokens.
Provable guarantees for active seriation offer a sample-efficient route to ordering recovery from noisy pairwise comparisons.
LLMs can plan effective e-commerce searches within strict latency budgets by first probing the retrieval environment to ground their reasoning.
RAG systems readily absorb and amplify ideological biases present in retrieved documents, even more so when prompts explicitly describe the ideological dimensions at play.
Voronoi cells and whitening can be combined to create LiDAR place recognition descriptors that implicitly measure Mahalanobis distance, improving performance on standard benchmarks.
Tired of RAG evaluation datasets with legal baggage or LLM-hallucinated inconsistencies? OrgForge offers a multi-agent simulation environment that guarantees ground truth, temporal structure, and cross-artifact consistency for realistic corporate scenarios.
LLMs can now achieve state-of-the-art performance in transaction analytics by grounding them with a retrieval-augmented knowledge base of behavioral patterns derived from financial transactions.
Insurance LLM slashes hallucinations to a record-low 0.6% while beating DeepSeek and Gemini, proving you *can* have domain mastery without sacrificing general smarts.
Stop brute-forcing question answering over hybrid data lakes: A.DOT Planner compiles NL queries into DAGs for efficient, multi-hop reasoning across structured and unstructured data, boosting correctness by 14.8%.
Oblivis enables practical, privacy-preserving database queries in cloud and edge settings, achieving up to 10^6x speedups over standard Oblivious Transfer methods.
Capture and preserve the expertise of aging workforces with Expert Mind, a RAG-based system that turns tacit knowledge into a queryable asset.
Surface-level metrics like BLEU are misleading for evaluating dialogue systems, as human and LLM judges reveal critical flaws in coherence and consistency that these metrics miss entirely.
Forget retraining: GenRecEdit injects knowledge about new items into generative recommendation models, boosting cold-start performance by up to 10x while slashing training time by 90%.
Stop manually synthesizing related work: ResearchPilot automates the process with a self-hostable, multi-agent system that extracts, synthesizes, and drafts literature reviews.
Graph-based fraud detection gets a boost with STC-MixHop, a framework that leverages multi-scale neighborhood diffusion and temporal consistency to outperform existing methods, especially when relational dependencies are key.
The pursuit of "open search" risks being co-opted by powerful corporations unless it shifts focus from technical openness to the actual capabilities afforded to users.
Recommendation systems can now systematically debias engagement signals across user, content, and model dimensions using a lightweight, in-model approach, leading to more accurate value models and stable ecosystem dynamics.
Stop wasting compute on query expansion: focusing it on re-ranking with stronger models and deeper candidate pools yields significantly better retrieval performance in reasoning-intensive tasks.
Replace ad-hoc memory decay and similarity metrics with provably convergent Riemannian dynamics and Fisher information, boosting agent memory performance by up to 20% while enabling zero-LLM deployments for data sovereignty.
Text-to-video retrieval models struggle to distinguish videos that differ only in their final state, revealing a critical gap in temporal reasoning and end-state grounding.
LLMs answering medical questions leak surprisingly large amounts of patient information, exposing a critical privacy-utility tradeoff that current benchmarks miss.
You don't need a cloud to ask EHRs questions: surprisingly competitive clinical question answering is possible with commodity hardware and local models.
You can boost fairness in LLM recommenders by up to 74% simply by prompting them to be fair, but watch out for unintended over-promotion of specific groups.
Traditional text embedding benchmarks fail to capture the nuances of long-horizon memory retrieval, but this new benchmark reveals that bigger models don't always win, and performance on standard tasks doesn't guarantee success in complex, context-dependent memory scenarios.
Shrinking a 2B vision-language retriever to a 70M text-only model achieves 95% of the original quality and outperforms a 2B baseline, while slashing query latency by 50x.
Injecting user mood into music recommendation boosts perceived quality, proving that personalized listening experiences can be significantly improved by considering emotional state.
GraphRAG, thought to be more robust to poisoning attacks due to its KG abstraction, is surprisingly vulnerable to KEPo, a novel attack that forges knowledge evolution paths to inject toxic events.
Crypto KOL credibility isn't just about credentials; it's a carefully performed balancing act between psychological needs, community expectations, and ethical self-regulation.
Agentic RAG systems can be made significantly more efficient and accurate simply by adding a contextualization module and de-duplicating retrieved documents at test time.
A simple modification to Dijkstra, Transfer Aware Dijkstra (TAD), doubles the speed of public transit routing while correctly handling buffer times, outperforming state-of-the-art RAPTOR-based algorithms.
Forget brittle retrieval: QChunker uses a question-aware multi-agent debate to restructure RAG from retrieval-augmentation to *understanding*-retrieval-augmentation, boosting performance across diverse domains.