Search papers, labs, and topics across Lattice.
100 papers published across 12 labs.
LLM-generated text alone can be a surprisingly effective and cost-efficient source of feedback for pseudo-relevance feedback, rivaling corpus-derived feedback in low-resource information retrieval tasks.
Reasoning rerankers don't magically fix fairness issues in search, preserving the biases of their input rankings despite boosting relevance.
Agentic search gets a meta-RL boost: MR-Search learns to self-reflect and adapt search strategies across episodes, significantly outperforming standard RL baselines.
By modeling contextual relationships between DNS queries, DNS-GT significantly improves domain name embedding quality, leading to better performance in botnet detection and domain classification.
By combining differentiable indexing with isotropic geometric optimization, DGI achieves state-of-the-art generative retrieval, especially for long-tail items that are often missed by other methods.
LLM-generated text alone can be a surprisingly effective and cost-efficient source of feedback for pseudo-relevance feedback, rivaling corpus-derived feedback in low-resource information retrieval tasks.
Reasoning rerankers don't magically fix fairness issues in search, preserving the biases of their input rankings despite boosting relevance.
Agentic search gets a meta-RL boost: MR-Search learns to self-reflect and adapt search strategies across episodes, significantly outperforming standard RL baselines.
By modeling contextual relationships between DNS queries, DNS-GT significantly improves domain name embedding quality, leading to better performance in botnet detection and domain classification.
By combining differentiable indexing with isotropic geometric optimization, DGI achieves state-of-the-art generative retrieval, especially for long-tail items that are often missed by other methods.
LLMGreenRec shows how LLMs can bridge the gap between user's green intentions and actual purchases, while simultaneously reducing the recommender system's carbon footprint.
Forget brittle KG traversals: MDER-DR's entity-centric summaries and decomposed queries boost multi-hop QA accuracy by up to 66% over standard RAG.
Hypergraphs and sampling can speed up exploratory business intelligence queries by over 16x compared to Neo4j, while maintaining high accuracy.
A massive, bilingual, authority-grounded dataset could finally make AI-assisted cataloging a reality.
Spot rug-pulls before they happen: a new framework combines blockchain data with social media buzz to predict crypto scams with improved accuracy.
Unlock millions of natural history specimens with a conversational AI that understands complex queries and dynamically retrieves data from live museum APIs.
News recommendations get a boost by modeling user interests as a stage-wise evolution, capturing both long-term preferences and rapidly shifting short-term interests.
Forget contrastive learning: LLM2Vec-Gen learns text embeddings by representing the *response* an LLM would generate, unlocking safety and reasoning abilities for embedding tasks.
Pinpointing performance bottlenecks in RAG pipelines just got easier: RAGPerf offers a modular benchmarking framework to dissect and optimize each component.
Ditching flat text for structured linked data in RAG systems can boost accuracy by nearly 30%, but only if you go beyond basic JSON-LD and add agent-friendly instructions and neural search.
Ditch the interleaved item-action token mess: new architectures slash sequence complexity by 50% in generative recommenders, boosting performance and cutting training time.
Item agents that self-promote can simultaneously boost recommendation accuracy and fairness, overturning the assumption that these goals are inherently at odds.
Secure coded caching, crucial for modern content delivery, often treats security as an afterthought, resulting in fragmented solutions that this review seeks to unify and improve.
LLMs can now autonomously retrieve relevant memories from a database using specialized tools, significantly improving performance on long-term conversational question answering.
ZipPIR delivers SimplePIR-level throughput without the massive client-side storage, finally making high-performance private information retrieval practical for resource-constrained devices.
Stop treating concept drift as one thing: DynaME's hybrid approach, separating recurring and emergent drifts, unlocks better online time series forecasting.
Achieve fine-grained access control in searchable encryption without re-encryption or excessive interaction, enabling practical multi-client deployments in dynamic clouds.
Forget retraining: Ego personalizes VLMs on the fly by extracting and leveraging visual tokens that represent specific concepts using the model's internal attention.
Now you can test if your AI system is ready for the EU AI Act, thanks to a new benchmark that combines legal expertise and LLM-generated scenarios.
Reverse image search, a key tool for visual fact-checking, often amplifies misinformation and irrelevant content, burying debunking information.
LLM agents can now achieve a +41pp boost in first-try success and 100% accuracy in 2-way logistics compositions by using PRECEPT's novel combination of retrieval, memory, and prompt evolution.
Forget relying on just ingredients: this method shows how fusing semantic, lexical, and nutritional aspects significantly improves recipe similarity estimation, aligning more closely with expert judgment.
Forget brittle multi-hop reasoning: TaSR-RAG's taxonomy-guided triple matching boosts RAG performance by 14% without costly graph construction.
Forget expensive fine-tuning: FoodOntoRAG links food entities with near SOTA accuracy while adapting to evolving ontologies using a clever RAG architecture with retrieval, selection, scoring, and synonym generation agents.
LLM-powered recommendation agents can now autonomously investigate and bridge information gaps, leading to better recommendations, thanks to a new tool-augmented reasoning framework.
Recommendation welfare can provably exceed any learner-measurable treatment policy when downstream actors possess private information, forcing a critical re-evaluation of learning objectives in bandit settings with noncompliance.
Achieve RAG efficiency without sacrificing accuracy: LooComp prunes context by identifying and retaining only the most critical sentences for answering a query.
Forget fine-tuning: this training-free method boosts retrieval accuracy for tricky negation queries by up to 10% using clever embedding optimization.
Ditch global embeddings for text-motion retrieval: this method uses joint-angle motion images and token-patch late interaction to achieve state-of-the-art accuracy and interpretability.
Retrieval-augmented agents get a serious reasoning boost by explicitly evaluating their own retrieval quality at each step, leading to state-of-the-art performance on multi-hop question answering.
LLMs can now retrieve memories like humans, using a fast familiarity check or a deliberate recollection process, leading to better personalization without overwhelming the model with irrelevant context.
Spectrum regulators can now leverage AI to dynamically plan and allocate spectrum resources, thanks to a new data-driven approach that accurately forecasts demand with high reliability across diverse urban environments.
Ditch the IPW variance headache: a new nonparametric weighting method slashes variance in off-policy evaluation without sacrificing bias.
$P^2$GNN's plug-and-play prototype approach boosts GNN performance by injecting global context and denoising local neighborhoods, achieving state-of-the-art results across diverse datasets.
Tired of sifting through mountains of internal docs? This RAG system uses a clever two-tiered vector DB to surface the right physics analysis, not just keywords.
Forget tweaking knobs – this new Gram-matrix-based audio representation lets you *retrieve* the perfect, editable audio effect preset, outperforming standard methods.
Language models often disregard provided context, choosing instead to rely on potentially outdated or conflicting information learned during pre-training, revealing a critical flaw in their knowledge integration.
LLMs can drastically reduce manual effort for domain experts in accessing complex food and nutrition data via RAG, but still struggle with queries that exceed the representational scope of the metadata.
Stop blindly rewriting content: AgentGEO diagnoses *why* documents fail to be cited in AI responses, leading to a 40% boost in citations with minimal content changes.
Token pruning in dense retrieval gets a geometric upgrade: Voronoi cells offer a principled way to shrink your index without sacrificing search quality.
Can RAG systems handle complex, multi-sentence queries while maintaining factual grounding and transparency?
Meta Pixel's default settings lead to near-ubiquitous tracking of user activity and identity, even on health-related websites, while advertised tracking restrictions are easily bypassed.
Confidence-based abstention in ranked decision systems often fails due to overlooked contextual uncertainty, challenging the common practice of exception-based intervention.
A single graph foundation model can now achieve state-of-the-art anomaly detection across diverse graph domains, thanks to a new theory of "Anomaly Disassortativity" that tackles domain shift.
Slash embedded software testing time by up to 66% with an LLM-powered RAG pipeline that generates 270 syntactically correct unit tests per hour.
LLMs may secretly be better at information retrieval than embedding similarity suggests, but current datasets are too "short-sighted" to prove it.
Forget local semantic alignment: CAST unlocks temporally coherent video retrieval and generation by explicitly modeling visual state transitions.
LoopLens reveals a stark divide in how musicians with and without domain expertise approach creative search for music loops, highlighting the need for vocabulary-independent discovery tools.
Generative search rankings are far more unstable than you think: single-run citation metrics provide a misleadingly precise view of domain visibility.
Even the most advanced LLMs stumble when asked to reason over a large, heterogeneous document corpus, achieving only 34% accuracy on the new OfficeQA Pro benchmark despite direct access to the relevant documents.
Forget exhaustive enumeration: a Transformer-based reinforcement learning approach can efficiently optimize sequential service region design under uncertainty, outperforming standard DRL methods.
Spotting coordinated fake reviewers just got easier: a new graph learning method boosts detection accuracy by adaptively weighing network diversity and similarity.
Turns out, buying stars and downloads for open-source software doesn't actually trick developers into using it.
Online A/B testing's classic Difference-in-Means estimator is just off-policy Inverse Propensity Scoring in disguise.
Swapping variables in mathematical formulas during graph contrastive learning surprisingly improves retrieval accuracy by preserving crucial algebraic relationships.
Retrieval augmentation lets head avatars handle novel expressions better by mixing in similar expressions from a large unlabeled dataset during training, boosting generalization without extra labels or architecture changes.
Stop blindly optimizing for retrieval relevance in RAG pipelines: coverage-based retrieval metrics are better early indicators of the final generated response's information coverage.
YouTube channels favored by users with extreme ideologies disproportionately produce content laced with anger and grievance, amplifying ideological shifts.
Unlock the hidden knowledge in millions of pathology reports: PathoScribe turns static archives into a reasoning-enabled "living library" accessible via natural language.
LLMs can achieve state-of-the-art results on complex reasoning tasks with far fewer parameters by iteratively excavating and reasoning over external knowledge.
Achieve state-of-the-art personalized gaze estimation by intelligently reweighting pre-trained features, rather than learning new ones from scratch.
By explicitly modeling speech, SAVE leapfrogs existing audio-visual methods for video-text retrieval, achieving substantial gains over the state-of-the-art.
A consensus-driven multi-LLM pipeline can improve information extraction for missing-person investigations, offering a practical approach to leveraging LLMs in high-stakes scenarios.
Ditch the extra embedding model: LLMs can retrieve information almost as well using just their internal representations, cutting complexity and latency.
By decomposing RAG along the document axis with specialized agents, SPD-RAG achieves state-of-the-art performance on multi-document QA while slashing API costs by over 60%.
Current machine unlearning methods for recommender systems struggle with robustness and sequential deletions, especially in attention-based and recurrent models, highlighting a critical gap ERASE helps to expose.
Get near-peak performance for your recommender system across GPUs and TPUs without tedious platform-specific tuning, thanks to a new cross-accelerator graph optimization framework.
Injecting retrieved anatomical priors into text-to-CT generation dramatically improves image fidelity and clinical consistency, offering a scalable path to more realistic medical image synthesis.
Current LLM agents stumble when vital information isn't indexed by search engines, but a new multi-agent framework, UIS-Digger, shows how proactive browsing and file parsing can overcome this limitation.
LLMs struggle to provide reliable answers to Islamic queries, but Fanar-Sadiq's multi-agent architecture, with specialized modules for scripture, jurisprudence, and calculations, delivers grounded and verifiable responses.
Unsupervised graph alignment gets a speed boost: GlobAlign-E slashes computation time by an order of magnitude while simultaneously boosting accuracy by up to 20%.
Text-rich networks get a hierarchical upgrade: TIER leverages LLMs and contrastive learning to build taxonomy-aware node embeddings, significantly outperforming existing methods.
By adaptively fusing low- and high-frequency graph signals based on local anomaly context, SAGAD achieves state-of-the-art graph anomaly detection while scaling linearly to large graphs.
Forget independent feature extraction: a new architecture uses LVLMs to explicitly model the relationships between drone and satellite imagery, substantially boosting geolocalization accuracy.
LLMs can generate better recommendations if they pause to verify their reasoning steps, rather than reasoning in one long chain.
Ditch those clunky MBRs: GP-Tree uses fine-grained grid cells in a prefix tree to speed up spatial queries by up to 10x.
Tired of fragmented datasets? SeDa unifies 7.6M+ datasets from 200+ platforms with semantic annotation and provenance tracking, making cross-domain data discovery a breeze.
A hierarchical RAG framework with ensemble inference and LLM-powered query planning crushes the WattBot 2025 Challenge, showing that carefully structured retrieval and answer stabilization are key to high-precision question answering.
RAG can backfire spectacularly on strong LLMs in Quebec insurance QA, causing "context distraction" and performance regressions, even as it massively boosts weaker models.
Recommender systems can move beyond passive item lists: RecPilot's multi-agent framework autonomously explores item spaces and generates user-centric reports, significantly reducing user effort in item evaluation.
Naive retrieval hurts performance when predicting cellular responses to gene perturbations, but a differentiable, cell-type-aware retrieval mechanism like PT-RAG significantly boosts accuracy.
LLMs can now tap into the full power of R's statistical methods: a new retrieval method boosts package retrieval accuracy by 17% by understanding data distributions, not just function names.
Forget task-specific fine-tuning: TSEmbed unlocks SOTA multimodal embeddings by disentangling task objectives with a Mixture-of-Experts and a novel expert-aware negative sampling strategy.
E-commerce retrieval gets a visual boost: domain-specific fine-tuning and two-stage alignment unlock the power of product images, outperforming text-only approaches.
Automatically generating personas from VR app store reviews can efficiently foster empathy and uncover hidden accessibility needs in VR development.
Forget Claude and GPT: KARL, a reinforcement-learning-trained enterprise search agent, achieves Pareto-optimal performance on a diverse suite of search tasks, even outperforming closed models with sufficient compute.
Semantic filtering with LLMs doesn't have to be a slow, linear slog: this new clustering-sampling-voting approach slashes LLM calls by up to 355x without sacrificing accuracy.
By dynamically weighting historical interactions, TIPS lets sequential recommenders see past the biases of what users *actually* clicked, revealing what they *would* have clicked.
Nail design retrieval gets a major upgrade: NaiLIA leverages dense intent descriptions and palette queries to outperform standard methods, opening the door to more nuanced and personalized image search.
Entity recognition models can effectively spot RAG-powered native ads, even when advertisers try to disguise them with different styles.
Human expertise, often overlooked in black-box bidding models, can be effectively injected into online advertising bid optimization via a dual-process control mechanism, leading to significant performance gains.
Ditch Leiden clustering for GraphRAG: k-core decomposition offers a deterministic, faster, and more effective way to build knowledge graph hierarchies for better LLM reasoning.
Forget personalized PageRank and Node2Vec: Jaccard-biased random walks plus rank aggregation yield surprisingly robust node affinities, outperforming alternatives on diverse graph types.
MOOSEnger achieves a 93% success rate in generating runnable multiphysics simulation inputs from natural language, while LLMs alone fail 92% of the time.
Achieve expert-level hepatology diagnosis by mimicking multidisciplinary consultation, using an AI system that combines knowledge graph reasoning, clinical guidelines, and a multi-agent system for traceable consensus.