March 11 – March 18, 2026

Recommendation & Information Retrieval - Weekly Roundup

100 papers published across 4 labs.

6% acceleration

Selected Labs publishing this week

Tsinghua AI3 CMU ML2 Amazon Science1 AI21

Top Papers

Mar 18, 2026

Stupid Human2w ago·also Oxford

Auditing Preferences for Brands and Cultures in LLMs

LLMs exhibit consistent and detectable geographic preferences for brands and cultures, revealing potential biases in market intermediation that persist across user personas.

Jasmine Rienecker, Jasmine Rienecker, Katarina Mpofu +9

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Recommendation & Information Retrieval

2w ago·also Bilibili Inc.

Deploying Semantic ID-based Generative Retrieval for Large-Scale Podcast Discovery at Spotify

Spotify's GLIDE model proves that generative LLMs can drive significant gains in podcast discovery and non-habitual listening in a real-world, production environment.

Edoardo D'Amico, Marco De Nadai, Praveen Chandar +56

Natural Language Processing Recommendation & Information Retrieval Speech & Audio

Guangzhi Wang +32w ago

CRE-T1 Preview Technical Report: Beyond Contrastive Learning for Reasoning-Intensive Retrieval

Ditch static embeddings: Generative retrieval, powered by reinforcement learning, lets models dynamically reason about relevance, outperforming larger contrastively-trained models on reasoning-intensive tasks.

Guangzhi Wang, Yinghao Jiao, Ying Jiao +1

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Michał Szyfelbein +12w ago

Average Case Graph Searching in Non-Uniform Cost Models

Finding a hidden node in a graph just got a whole lot faster: a new algorithm slashes the average search cost with provable approximation guarantees, even with non-uniform query costs.

Michał Szyfelbein, Michal Szyfelbein

Recommendation & Information Retrieval

2w ago

VLM2Rec: Resolving Modality Collapse in Vision-Language Model Embedders for Multimodal Sequential Recommendation

Naive fine-tuning of VLMs for multimodal sequential recommendation causes catastrophic modality collapse, but can be fixed with gradient rebalancing and cross-modal regularization.

Junyoung Kim, Woojoo Kim, Jaehyung Lim +2

Multimodal Models Recommendation & Information Retrieval

All Papers (100)

Mar 18, 2026

Stupid Human2w ago·also Oxford

Auditing Preferences for Brands and Cultures in LLMs

LLMs exhibit consistent and detectable geographic preferences for brands and cultures, revealing potential biases in market intermediation that persist across user personas.

Jasmine Rienecker, Jasmine Rienecker, Katarina Mpofu +9

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Recommendation & Information Retrieval

2w ago·also Bilibili Inc.

Deploying Semantic ID-based Generative Retrieval for Large-Scale Podcast Discovery at Spotify

Spotify's GLIDE model proves that generative LLMs can drive significant gains in podcast discovery and non-habitual listening in a real-world, production environment.

Edoardo D'Amico, Marco De Nadai, Praveen Chandar +56

Natural Language Processing Recommendation & Information Retrieval Speech & Audio

Guangzhi Wang +32w ago

CRE-T1 Preview Technical Report: Beyond Contrastive Learning for Reasoning-Intensive Retrieval

Guangzhi Wang, Yinghao Jiao, Ying Jiao +1

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Michał Szyfelbein +12w ago

Average Case Graph Searching in Non-Uniform Cost Models

Finding a hidden node in a graph just got a whole lot faster: a new algorithm slashes the average search cost with provable approximation guarantees, even with non-uniform query costs.

Michał Szyfelbein, Michal Szyfelbein

Recommendation & Information Retrieval

2w ago

VLM2Rec: Resolving Modality Collapse in Vision-Language Model Embedders for Multimodal Sequential Recommendation

Naive fine-tuning of VLMs for multimodal sequential recommendation causes catastrophic modality collapse, but can be fixed with gradient rebalancing and cross-modal regularization.

Junyoung Kim, Woojoo Kim, Jaehyung Lim +2

Multimodal Models Recommendation & Information Retrieval

2w ago·also WHU

From Isolated Scoring to Collaborative Ranking: A Comparison-Native Framework for LLM-Based Paper Evaluation

Stop training LLMs to assign arbitrary scores to papers in isolation; comparison-based ranking unlocks significantly better generalization and accuracy in paper evaluation.

P. Zheng, Pujun Zheng, Jiacheng Yao +8

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Karan Goyal +32w ago

Public Profile Matters: A Scalable Integrated Approach to Recommend Citations in the Wild

Existing citation recommendation benchmarks overestimate real-world performance because they fail to account for the temporal constraints of recommending citations for *new* papers.

Karan Goyal, Dikshant Kukreja, Vikram Goyal +1

Natural Language Processing Recommendation & Information Retrieval

2w ago

A Unified Language Model for Large Scale Search, Recommendation, and Reasoning

Forget tool-augmented systems: NEO shows you can consolidate search, recommendation, and reasoning into a single language-steerable LLM by representing items as SIDs and interleaving them with natural language.

Marco De Nadai, Edoardo D'Amico, Max Lefarov +23

Reasoning & Chain-of-Thought Recommendation & Information Retrieval Tool Use & Agents

2w ago·also UTS

Learning Evolving Preferences: A Federated Continual Framework for User-Centric Recommendation

Federated recommendation systems can now better adapt to evolving user preferences without sacrificing privacy, thanks to a novel approach that retains historical knowledge and transfers insights between similar users.

Chunxu Zhang, Zhi Xue, Guodong Long +2

Distributed Systems & Hardware Recommendation & Information Retrieval Training Efficiency & Optimization

2w ago

ListK: Semantic ORDER BY and LIMIT K with Listwise Prompting

Semantic sorting in LLMs can be twice as fast with no loss in accuracy by strategically combining listwise ranking algorithms.

Jay W. Shin, Jason Shin, Jiwon Chang +1

Code Generation & Program Synthesis Natural Language Processing Recommendation & Information Retrieval

Oliver Zahn +12w ago

Facts as First Class Objects: Knowledge Objects for Persistent LLM Memory

LLMs forget up to 60% of facts when summarizing and erode over half of project constraints during iterative compaction, but a simple discrete memory system (KOs) fixes this while slashing costs by 252x.

Oliver Zahn, Simran Chana

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Amazon Science2w ago

LAAF: Logic-layer Automated Attack Framework A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems

Agentic LLMs are surprisingly vulnerable: a new framework finds successful attacks in 84% of attempts by escalating prompt injection techniques across multiple stages.

Hammad Atta, Hammad Atta, Ken Huang +25

Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness Tool Use & Agents

GE HealthCare2w ago

Negation is Not Semantic: Diagnosing Dense Retrieval Failure Modes for Trade-offs in Contradiction-Aware Biomedical QA

Seemingly sophisticated dense retrieval methods can catastrophically fail at contradiction detection due to "Semantic Collapse," highlighting the surprising effectiveness of a simple, decoupled lexical approach for reliable biomedical QA.

S. Sahoo, Soumya Ranjan Sahoo, Gagan N. +3

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Z.H. College of Engineering & Technology2w ago·also Aligarh Muslim University, Interdisciplinary Center for Artificial Intelligence

Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval

LLMs can be systematically shifted from stochastic pattern-matchers to verified truth-seekers using a carefully orchestrated, multi-stage retrieval and verification pipeline.

Md. Asraful Haque, Aasar Mehdi, Maaz Mahboob +1

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval+1

Jin Xie +52w ago

SEAL-Tag: Self-Tag Evidence Aggregation with Probabilistic Circuits for PII-Safe Retrieval-Augmented Generation

RAG systems can now achieve 8x better PII leakage protection without sacrificing utility or speed, thanks to a novel "Verify-then-Route" paradigm.

Jin Xie, Jin Xie, Songze Li +3

Natural Language Processing Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

Alexandros Efstratiou +22w ago

Information Pathways in Online Science Communication: The Role of Platform Actors and News Media

"Superspreader" networks on Twitter amplify contrarian scientific viewpoints, influencing news media coverage and potentially distorting public understanding of science.

Alexandros Efstratiou, Giuseppe Russo, Luca Luceri

Natural Language Processing Recommendation & Information Retrieval

Zichen Tang +52w ago

Is Your LLM-as-a-Recommender Agent Trustable? LLMs'Recommendation is Easily Hacked by Biases (Preferences)

LLM-powered recommendation agents, despite their reasoning prowess, are easily manipulated by contextual biases in high-stakes scenarios like paper review and job recruitment.

Zichen Tang, Zirui Zhang, Ziru Zhang +3

Eval Frameworks & Benchmarks Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

2w ago·also Northeastern, Punch Cyber Analytics

Retrieval-Augmented LLMs for Security Incident Analysis

LLMs armed with RAG can reconstruct cyberattacks with high precision and recall, but the best model for the job depends on your budget: DeepSeek V3 matches Claude Sonnet 4's accuracy at 1/15th the cost.

Xavier Cadet, Xavier Cadet, Aditya Vikram Singh +14

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents

Oksana Kolomenko +22w ago

Embedding World Knowledge into Tabular Models: Towards Best Practices for Embedding Pipeline Design

Forget chasing leaderboard hype: this study reveals that larger embedding models and strategic concatenation are key to unlocking LLM-powered tabular prediction, regardless of public rankings.

Oksana Kolomenko, Ricardo Knauer, Erik Rodner

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

2w ago

Adaptive Guidance for Retrieval-Augmented Masked Diffusion Models

No training needed: ARAM dynamically adjusts retrieved context guidance in masked diffusion models based on signal quality, resolving retrieval-prior conflicts on the fly.

Jaemin Kim, Jong Chul Ye

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

2w ago

Retrieval-Augmented LLM Agents: Learning to Learn from Experience

Retrieval-augmented LLM agents can learn to learn from experience, achieving significantly better generalization on unseen tasks by combining the strengths of fine-tuning and in-context retrieval.

Thomas Palmeira Ferraz, Romain Deffayet, Vassilina Nikoulina +2

Reasoning & Chain-of-Thought Recommendation & Information Retrieval Tool Use & Agents

CMU ML2w ago·also JHU

Temporal Narrative Monitoring in Dynamic Information Environments

Discover emergent narratives in real-time without predefined labels, revealing how information evolves during crises.

David Farr, Stephen Prochaska, Jack Moody +4

Natural Language Processing Recommendation & Information Retrieval

2w ago

PJB: A Reasoning-Aware Benchmark for Person-Job Retrieval

Stop chasing leaderboard gains on generic benchmarks: PJB reveals that domain-specific weaknesses in person-job retrieval far outweigh the benefits of general model upgrades, and that query understanding modules can actually hurt performance.

Guangzhi Wang, Xiaohui Yang, Kai Li +4

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Chaeyoung Huh +32w ago

PACE-RAG: Patient-Aware Contextual and Evidence-based Policy RAG for Clinical Drug Recommendation

LLMs can now recommend drugs with state-of-the-art accuracy by synthesizing individual patient context with the prescribing tendencies of similar cases, outperforming guideline-based and similar-patient retrieval methods.

Chaeyoung Huh, Hyunmin Hwang, Jung Hwan Shin +1

Natural Language Processing Recommendation & Information Retrieval Scientific Discovery & Drug Design

Chinenye Omejieke +22w ago

Objective Mispricing Detection for Shortlisting Undervalued Football Players via Market Dynamics and News Signals

Forget subjective scouting reports: this framework objectively identifies undervalued football players by blending market dynamics with news sentiment, offering a data-driven edge in talent acquisition.

Chinenye Omejieke, Shuyao Chen, Xia Cui

Natural Language Processing Recommendation & Information Retrieval

CMU ML2w ago·also INSA Rennes

CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents

Forget specialized tools: a standard Unix terminal and clever RL are all you need to beat much larger LLMs at code search.

Lintang Sutawika, Aditya Bharat Soni, R. BharathSriraamR +11

Code Generation & Program Synthesis Recommendation & Information Retrieval Tool Use & Agents

Mar 17, 2026

2w ago

OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation

Forget full finetuning: OPERA's dynamic pruning lets you adapt retrieval models to new domains with better ranking and recall, in half the time.

Haoyang Fang, Shuai Zhang, Yifei Ma +5

Inference & Quantization Recommendation & Information Retrieval Training Efficiency & Optimization

University of Calcutta2w ago·also Indian Statistical Institute Kolkata, Mississippi State University

Beyond Accuracy: Evaluating Forecasting Models by Multi-Echelon Inventory Cost

Temporal CNNs and LSTMs can slash inventory costs and boost fill rates compared to traditional forecasting methods, offering a tangible advantage for supply chain optimization.

Swata Marik, Swayamjit Saha, Garga Chatterjee

Eval Frameworks & Benchmarks Recommendation & Information Retrieval

Prashanth Vijayaraghavan +72w ago

SYMDIREC: A Neuro-Symbolic Divide-Retrieve-Conquer Framework for Enhanced RTL Synthesis and Summarization

Symbolic planning unlocks significant gains in RTL synthesis and summarization, boosting LLM performance by 20% without fine-tuning.

Prashanth Vijayaraghavan, Apoorva Nitsure, Luyao Shi +5

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Guanghui Zhao +42w ago

Toward Reliable Scientific Visualization Pipeline Construction with Structure-Aware Retrieval-Augmented LLMs

Forget generic code generation – this work shows that structure-aware retrieval of domain-specific examples slashes the debugging needed to get LLMs to produce working scientific visualization pipelines.

Guanghui Zhao, Zhe Wang, Yu Dong +2

Code Generation & Program Synthesis Recommendation & Information Retrieval Tool Use & Agents

Wikimedia Foundation2w ago·also Pompeu Fabra University

Multilingual Reference Need Assessment System for Wikipedia

Wikipedia editors can now get AI assistance to identify claims needing citations in 10 languages, improving content reliability at scale.

A. Baigutanova, F. Navas, Pablo Aragón +5

Data Curation & Synthetic Data Natural Language Processing Recommendation & Information Retrieval

University of Innsbruck2w ago·also MEF University

How often do Answers Change? Estimating Recency Requirements in Question Answering

LLMs struggle with questions requiring up-to-date information, especially when the recency requirement is context-dependent, highlighting a critical gap in temporal reasoning.

Bhawna Piryani, Zehra Mert

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

2w ago·also Qi An Xin Technology Group Inc., Texas Tech University

SpecSteer: Synergizing Local Context and Global Reasoning for Efficient Personalized Generation

Achieve personalized generation with cloud-scale reasoning while preserving user privacy, thanks to a novel asymmetric collaboration framework that's also 2x faster.

Hang Lv, Sheng Liang, Yongyue Zhang +5

Inference & Quantization Natural Language Processing Recommendation & Information Retrieval

Surya Vardhan Yalavarthi2w ago

Open-Source Reproduction and Explainability Analysis of Corrective Retrieval Augmented Generation

CRAG's retrieval evaluator surprisingly relies on named entity alignment, not semantic similarity, to judge document quality.

Surya Vardhan Yalavarthi

Natural Language Processing Open-Source Models & Weights Recommendation & Information Retrieval

Karthik Govindappa2w ago

Visual Product Search Benchmark

Off-the-shelf foundation models struggle with instance-level visual product search in industrial settings, often falling short compared to domain-specific models.

Karthik Govindappa

Computer Vision Eval Frameworks & Benchmarks Recommendation & Information Retrieval

Isha Andrade +52w ago

Detecting Sentiment Steering Attacks on RAG-enabled Large Language Models

LSTM-based intrusion detection can achieve 99.42% accuracy in identifying cyber threats within IoT networks, slightly outperforming CNN-based approaches.

Isha Andrade, Shalaka S. Mahadik, Mithun Mukherjee +3

Natural Language Processing Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

2w ago·also Tsinghua AI, Ant Group, NUDT, ZJU

RaDAR: Relation-aware Diffusion-Asymmetric Graph Contrastive Learning for Recommendation

By intelligently injecting and removing noise, RaDAR significantly improves recommendation accuracy in sparse and noisy collaborative filtering environments.

Yixuan Huang, Jiawei Chen, Shengfan Zhang +1

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval

Wei Min Loh +32w ago

A Practical Algorithm for Feature-Rich, Non-Stationary Bandit Problems

Thompson Sampling gets a major upgrade with C3, outperforming existing methods by 12.4% in click-through rate on the Microsoft News Dataset by better handling non-stationary correlated rewards.

Wei Min Loh, Sajib Kumer Sinha, Ankur Agarwal +1

Recommendation & Information Retrieval Training Efficiency & Optimization

Ankit Ghimire +22w ago

A Depth-Aware Comparative Study of Euclidean and Hyperbolic Graph Neural Networks on Bitcoin Transaction Systems

Hyperbolic GNNs on Bitcoin transaction networks need careful tuning of learning rate and curvature to stabilize high-dimensional embeddings, a factor often overlooked.

Ankit Ghimire, Saydul Akbar Murad, Nick Rahimi

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval

2w ago

Evo-Retriever: LLM-Guided Curriculum Evolution with Viewpoint-Pathway Collaboration for Multimodal Document Retrieval

LLMs can dynamically optimize the training curriculum of multimodal retrieval models, leading to significant gains in retrieval accuracy by adapting to the model's evolving state.

Weiqing Li, Jinyue Guo, Yaqi Wang +4

Multimodal Models Recommendation & Information Retrieval Training Efficiency & Optimization

2w ago·also TU Delft, Vrije Universiteit Amsterdam

Leveraging LLMs for Structured Information Extraction and Analysis from Cloud Incident Reports (Work In Progress Paper)

Lightweight LLMs like Gemini 2.0 and GPT-3.5 can extract key metadata from cloud incident reports with surprisingly high accuracy (75-95%), offering a cost-effective alternative to larger models.

Xiaoyu Chu, Shashikant Ilager, Yizhen Zang +2

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval+1

Lizheng Sun2w ago

MemX: A Local-First Long-Term Memory System for AI Assistants

Achieve 91%+ Hit@1 retrieval accuracy in a local-first long-term memory system for AI assistants by combining vector recall, keyword recall, RRF, and re-ranking, while maintaining sub-90ms search latency at scale.

Lizheng Sun

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents

2w ago·also UIUC

Answer Bubbles: Information Exposure in AI-Mediated Search

Generative search engines create "answer bubbles" by selectively citing and framing information, leading to divergent information realities compared to traditional search.

Michelle Huang, Agam Goyal, Koustuv Saha +1

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

2w ago·also JD.com

RecBundle: A Next-Generation Geometric Paradigm for Explainable Recommender Systems

Escape the flatland of traditional recommender systems: RecBundle uses differential geometry to disentangle user interactions from preferences, opening the door to understanding and mitigating systemic biases.

Hui Wang, Tianzhuo Hu, Tianzhu Hu +7

Interpretability & Mechanistic Interp Recommendation & Information Retrieval

2w ago

SIA: A Synthesize-Inject-Align Framework for Knowledge-Grounded and Secure E-commerce Search LLMs with Industrial Deployment

E-commerce search LLMs can be made both more knowledgeable and secure via a surprisingly simple three-stage framework of data synthesis, parameter-efficient pre-training, and dual-path alignment.

Zhouwei Zhai, Zhouwei Zhai, Mengxiang Chen +1

Natural Language Processing Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

Patrick Levi +12w ago

Towards Unsupervised Adversarial Document Detection in Retrieval Augmented Generation Systems

Unsupervised detection of adversarial attacks in RAG systems is possible using generator activations and uncertainty measures, even without knowing the target prompt.

Patrick Levi, Patrick Levi

Natural Language Processing Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

Aojie Yuan2w ago

HMAR: Hierarchical Modality-Aware Expert and Dynamic Routing Medical Image Retrieval Architecture

By adaptively routing medical image queries to global and local feature experts, HMAR achieves state-of-the-art retrieval accuracy without relying on expensive bounding box annotations.

Aojie Yuan

Computer Vision Multimodal Models Recommendation & Information Retrieval

2w ago

Retrieving Counterfactuals Improves Visual In-Context Learning

Counterfactual examples supercharge visual in-context learning, enabling smaller vision-language models to outperform larger ones by focusing on causal relationships rather than superficial correlations.

Guangzhi Xiong, Sanchit Sinha, Zhenghao He +1

Computer Vision Multimodal Models Reasoning & Chain-of-Thought+1

Sangyeon Yoon +72w ago

BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs

LLMs struggle to selectively apply user preferences stored in memory, often misapplying them even when social norms dictate otherwise, revealing a critical gap in context-aware personalization.

Sangyeon Yoon, SunKyoung Kim, Hyesoo Hong +5

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Recommendation & Information Retrieval

AI22w ago·also Alongside.care

Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users

Synthetic benchmarks can't catch the nuances of personalized deep research, as real users revealed nine critical errors that LLM judges missed entirely.

Nishant Balepur, Nishant Balepur, Malachi Hamada +14

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Moon－Kyeoung Park +32w ago

ReFORM: Review-aggregated Profile Generation via LLM with Multi-Factor Attention for Restaurant Recommendation

Restaurant recommendations get a flavor upgrade: ReFORM uses LLMs to distill user preferences and item qualities from reviews, then spotlights the decision factors that truly matter.

Moon－Kyeoung Park, Moonsoo Park, Seulbeen Je +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Tianyi Huang +12w ago

CounterRefine: Answer-Conditioned Counterevidence Retrieval for Inference-Time Knowledge Repair in Factual Question Answering

Instead of just gathering more context, turn retrieval into a mechanism for actively testing and refining a provisional answer, yielding substantial gains in factual QA accuracy.

Tianyi Huang, Ying Kai Deng

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Zhenghua Bao +12w ago

IndexRAG: Bridging Facts for Cross-Document Reasoning at Index Time

Achieve state-of-the-art multi-hop question answering by pre-computing bridging facts at index time, eliminating the need for complex online reasoning or graph traversal.

Zhenghua Bao, Yidong Shi

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Sahil Sen +42w ago

Chronos: Temporal-Aware Conversational Agents with Structured Event Retrieval for Long-Term Memory

LLMs can now remember and reason about long-term conversations with significantly improved accuracy thanks to a new temporal-aware memory framework that structures dialogue into event calendars.

Sahil Sen, Elias Lumer, Anmol Gulati +2

Reasoning & Chain-of-Thought Recommendation & Information Retrieval Tool Use & Agents

Tsinghua AI2w ago·also Tencent AI

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

LLM agents can now leverage a unified memory framework that dynamically adapts to different question types, enabling more coherent and user-centric long-horizon dialogues.

Shannan Yan, Jingchen Ni, Leqi Zheng +7

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents

2w ago·also CAS

Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic Insights

Conformal factuality for RAG breaks down when faced with distribution shifts or distractors, forcing a trade-off between factuality and informativeness.

Yi Chen, Daiwei Chen, Sukrut Madhav Chikodikar +2

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Mar 16, 2026

Michael Paris +32w ago

Estimating Absolute Web Crawl Coverage From Longitudinal Set Intersections

You can estimate the completeness of a web crawl using only its own historical data, without needing external datasets.

Michael Paris, G. Paris, Grigori Paris +1

Data Curation & Synthetic Data Recommendation & Information Retrieval

2w ago·also KAIST, Myongji University

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

Small language models can achieve surprisingly robust question answering by actively clustering their memories into semantically coherent groups, outperforming standard retrieval methods.

Taeyun Roh, Wonjune Jang, Junha Jung +1

Reasoning & Chain-of-Thought Recommendation & Information Retrieval Tool Use & Agents

Yi-Zhuo Ma +102w ago

Mitigating KG Quality Issues: A Robust Multi-Hop GraphRAG Retrieval Framework

Imperfect knowledge graphs can lead to retrieval drift and hallucinations in multi-hop reasoning, but C2RAG offers a robust solution that improves EM by 3.4% and F1 by 3.9% over existing methods.

Yi-Zhuo Ma, Shuang Liang, Rongzheng Wang +8

Data Curation & Synthetic Data Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Makoto Nakamura2w ago

Bridging National and International Legal Data: Two Projects Based on the Japanese Legal Standard XML Schema for Comparative Law Studies

Unlock cross-jurisdictional legal analysis by automatically identifying corresponding legal provisions across national systems using multilingual embeddings and XML schema conversions.

Makoto Nakamura

Natural Language Processing Recommendation & Information Retrieval

2w ago·also UCL

Multi-Scenario User Profile Construction via Recommendation Lists

Extracting user profiles from recommendation lists is now more accurate thanks to RAPI, a new framework that leverages BERT embeddings and sample augmentation to boost inference accuracy by dynamically weighting user characteristics.

Hui Zhang, Jiayun Liu, Jiayu Liu

Natural Language Processing Recommendation & Information Retrieval

A. Sakhno +72w ago·also Sber AI Lab

Embedding-Aware Feature Discovery: Bridging Latent Representations and Interpretable Features in Event Sequences

Forget hand-crafted features: this system uses an LLM to automatically discover features from event sequences that outperform even state-of-the-art embeddings by up to 5.8%.

A. Sakhno, I. Sergeev, A. Shestov +5

Interpretability & Mechanistic Interp Natural Language Processing Recommendation & Information Retrieval

Sui He2w ago

Machine Translation in the Wild: User Reaction to Xiaohongshu's Built-In Translation Feature

Users on Xiaohongshu are generally happy with the platform's new translation feature, but their creative use of slang, emoji, and coded language highlights the challenges of real-world machine translation.

Sui He

Natural Language Processing Recommendation & Information Retrieval

Aayush Garg +12w ago

Unsupervised Cross-Protocol Anomaly Analysis in Mobile Core Networks via Multi-Embedding Models Consensus

Combining multiple embedding models and looking for consensus flags just 1% of network records as anomalous, but flags *only* synthetic attacks, enabling security teams to focus on the needle in the haystack.

Aayush Garg, Orlando Amaral Cejas

Natural Language Processing Recommendation & Information Retrieval

Yijun Jin +32w ago

Towards Foundation Models for Consensus Rank Aggregation

Reinforcement learning unlocks fast, high-quality consensus ranking aggregation, outperforming classical heuristics and ILP solvers for the NP-hard Kemeny optimization problem.

Yijun Jin, Simon Klüttermann, Chiara Balestra +1

Natural Language Processing Recommendation & Information Retrieval

2w ago·also CNRS, Institut Universitaire de France (IUF)

Sampling-guided exploration of active feature selection policies

Reinforcement learning can now handle active feature selection in high-dimensional datasets by intelligently pruning the feature search space and regularizing decision sequences, outperforming existing methods in accuracy and policy complexity.

Gabriel Bernardino, Anders Jonsson, Patrick Clarysse +1

Recommendation & Information Retrieval Tool Use & Agents

Yulong Ming +42w ago

$p^2$RAG: Privacy-Preserving RAG Service Supporting Arbitrary Top-$k$ Retrieval

Privacy-preserving RAG gets a massive speed boost (3-300x) by ditching secure sorting for an interactive bisection method that also supports arbitrary top-$k$ retrieval.

Yulong Ming, Mingyue Wang, Jijia Yang +2

Natural Language Processing Recommendation & Information Retrieval

Saisha Shetty +22w ago

RadAnnotate: Large Language Models for Efficient and Reliable Radiology Report Annotation

LLMs can automate up to 90% of radiology report annotations with high accuracy, slashing expert review time.

Saisha Shetty, R. Goldman, Vladimir Filkov

Data Curation & Synthetic Data Natural Language Processing Recommendation & Information Retrieval

Zijian Yu +32w ago

Shopping Companion: A Memory-Augmented LLM Agent for Real-World E-Commerce Tasks

Even GPT-4 struggles with long-term preference capture in e-commerce, but a lightweight, jointly-trained LLM agent can beat it.

Zijian Yu, Kejun Xiao, Huaipeng Zhao +1

Eval Frameworks & Benchmarks Recommendation & Information Retrieval Tool Use & Agents

Jesper Derehag +22w ago

SmartSearch: How Ranking Beats Structure for Conversational Memory Retrieval

Forget complex LLM-based structuring: simple, deterministic retrieval with smart ranking beats state-of-the-art conversational memory systems while using 8.5x fewer tokens.

Jesper Derehag, Carlos Calva, Timmy Ghiurau

Natural Language Processing Recommendation & Information Retrieval

James Cheshire +12w ago

Active Seriation: Efficient Ordering Recovery with Statistical Guarantees

Provable guarantees for active seriation offer a sample-efficient route to ordering recovery from noisy pairwise comparisons.

James Cheshire, Yann Issartel

Recommendation & Information Retrieval

2w ago

Probe-then-Plan: Environment-Aware Planning for Industrial E-commerce Search

LLMs can plan effective e-commerce searches within strict latency budgets by first probing the retrieval environment to ground their reasoning.

Mengxiang Chen

Recommendation & Information Retrieval Tool Use & Agents World Models & Planning

2w ago·also Institut national de la recherche, Université du Québec en Outaouais

The Impact of Ideological Discourses in RAG: A Case Study with COVID-19 Treatments

RAG systems readily absorb and amplify ideological biases present in retrieved documents, even more so when prompts explicitly describe the ideological dimensions at play.

Elmira Salari, Maria Claudia Nunes Delfino, Hazem Amamou +4

Constitutional AI & AI Ethics Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

Jaein Kim +32w ago

Voronoi-based Second-order Descriptor with Whitened Metric in LiDAR Place Recognition

Voronoi cells and whitening can be combined to create LiDAR place recognition descriptors that implicitly measure Mahalanobis distance, improving performance on standard benchmarks.

Jaein Kim, Hee Bin Yoo, Dong-Sig Han +1

Computer Vision Recommendation & Information Retrieval Robotics & Embodied AI

Jeffrey Flynt +12w ago

OrgForge: A Multi-Agent Simulation Framework for Verifiable Synthetic Corporate Corpora

Tired of RAG evaluation datasets with legal baggage or LLM-hallucinated inconsistencies? OrgForge offers a multi-agent simulation environment that guarantees ground truth, temporal structure, and cross-artifact consistency for realistic corporate scenarios.

Jeffrey Flynt, Jeffrey H. Flynt

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Recommendation & Information Retrieval

A. Sakhno +112w ago·also Sber AI, Sber AI Lab

Financial Transaction Retrieval and Contextual Evidence for Knowledge-Grounded Reasoning

LLMs can now achieve state-of-the-art performance in transaction analytics by grounding them with a retrieval-augmented knowledge base of behavioral patterns derived from financial transactions.

A. Sakhno, Artem Sakhno, Daniil Tomilov +9

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Mar 15, 2026

Qian Zhu +62w ago

An Industrial-Scale Insurance LLM Achieving Verifiable Domain Mastery and Hallucination Control without Competence Trade-offs

Insurance LLM slashes hallucinations to a record-low 0.6% while beating DeepSeek and Gemini, proving you *can* have domain mastery without sacrificing general smarts.

Qian Zhu, Xinnan Guo, Jingjing Huo +4

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Recommendation & Information Retrieval

2w ago

Agentic DAG-Orchestrated Planner Framework for Multi-Modal, Multi-Hop Question Answering in Hybrid Data Lakes

Stop brute-forcing question answering over hybrid data lakes: A.DOT Planner compiles NL queries into DAGs for efficient, multi-hop reasoning across structured and unstructured data, boosting correctness by 14.8%.

Kirushikesh D B, Manish Kesarwani, Nishtha Madaan +6

Reasoning & Chain-of-Thought Recommendation & Information Retrieval Tool Use & Agents

Aydin Abadi +12w ago

Oblivis: A Framework for Delegated and Efficient Oblivious Transfer

Oblivis enables practical, privacy-preserving database queries in cloud and edge settings, achieving up to 10^6x speedups over standard Oblivious Transfer methods.

Aydin Abadi, Yvo Desmedt

Distributed Systems & Hardware Recommendation & Information Retrieval

Diego Ezequiel Cervera2w ago

Expert Mind: A Retrieval-Augmented Architecture for Expert Knowledge Preservation in the Energy Sector

Capture and preserve the expertise of aging workforces with Expert Mind, a RAG-based system that turns tacit knowledge into a queryable asset.

Diego Ezequiel Cervera

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents

2w ago

Rethinking Evaluation in Retrieval-Augmented Personalized Dialogue: A Cognitive and Linguistic Perspective

Surface-level metrics like BLEU are misleading for evaluating dialogue systems, as human and LLM judges reveal critical flaws in coherence and consistency that these metrics miss entirely.

Tianyi Zhang, David Traum

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

2w ago·also School of Information Technology and Management

Bringing Model Editing to Generative Recommendation in Cold-Start Scenarios

Forget retraining: GenRecEdit injects knowledge about new items into generative recommendation models, boosting cold-start performance by up to 10x while slashing training time by 90%.

Chenglei Shen, Teng Shi, Weijie Yu +2

Recommendation & Information Retrieval Training Efficiency & Optimization

2w ago

ResearchPilot: A Local-First Multi-Agent System for Literature Synthesis and Related Work Drafting

Stop manually synthesizing related work: ResearchPilot automates the process with a self-hostable, multi-agent system that extracts, synthesizes, and drafts literature reviews.

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents

2w ago

A Multi-Scale Graph Learning Framework with Temporal Consistency Constraints for Financial Fraud Detection in Transaction Networks under Non-Stationary Conditions

Graph-based fraud detection gets a boost with STC-MixHop, a framework that leverages multi-scale neighborhood diffusion and temporal consistency to outperform existing methods, especially when relational dependencies are key.

Yiming Lei, Qiannan Shen, Junhao Song

Natural Language Processing Recommendation & Information Retrieval

Nicola Neophytou +12w ago

Open, to What End? A Capability-Theoretic Perspective on Open Search

The pursuit of "open search" risks being co-opted by powerful corporations unless it shifts focus from technical openness to the actual capabilities afforded to users.

Nicola Neophytou, Bhaskar Mitra

Constitutional AI & AI Ethics Open-Source Models & Weights Recommendation & Information Retrieval

2w ago

MBD: A Model-Based Debiasing Framework Across User, Content, and Model Dimensions

Recommendation systems can now systematically debias engagement signals across user, content, and model dimensions using a lightweight, in-model approach, leading to more accurate value models and stable ecosystem dynamics.

Yuantong Li, Lei Yuan, Zhihao Zheng +26

Constitutional AI & AI Ethics Recommendation & Information Retrieval

Sreeja Apparaju +12w ago

Compute Allocation for Reasoning-Intensive Retrieval Agents

Stop wasting compute on query expansion: focusing it on re-ranking with stronger models and deeper candidate pools yields significantly better retrieval performance in reasoning-intensive tasks.

Sreeja Apparaju, Nilesh Gupta

Reasoning & Chain-of-Thought Recommendation & Information Retrieval Tool Use & Agents

Varun Pratap Bhardwaj +12w ago

SuperLocalMemory V3: Information-Geometric Foundations for Zero-LLM Enterprise Agent Memory

Replace ad-hoc memory decay and similarity metrics with provably convergent Riemannian dynamics and Fisher information, boosting agent memory performance by up to 20% while enabling zero-LLM deployments for data sovereignty.

Varun Pratap Bhardwaj, Varun Bhardwaj

Recommendation & Information Retrieval Tool Use & Agents

2w ago

GenState-AI: State-Aware Dataset for Text-to-Video Retrieval on AI-Generated Videos

Text-to-video retrieval models struggle to distinguish videos that differ only in their final state, revealing a critical gap in temporal reasoning and end-state grounding.

Tongna Chen, Tianrui Lv, Yishuai Zhang +2

Eval Frameworks & Benchmarks Multimodal Models Recommendation & Information Retrieval

2w ago·also A*STAR

MedPriv-Bench: Benchmarking the Privacy-Utility Trade-off of Large Language Models in Medical Open-End Question Answering

LLMs answering medical questions leak surprisingly large amounts of patient information, exposing a critical privacy-utility tradeoff that current benchmarks miss.

Shaowei Guan, Yu Zhai, Hin Chi Kwok +4

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Recommendation & Information Retrieval

Mar 14, 2026

Ibrahim Ebrar Yurt +32w ago·also University of Ulm

sebis at ArchEHR-QA 2026: How Much Can You Do Locally? Evaluating Grounded EHR QA on a Single Notebook

You don't need a cloud to ask EHRs questions: surprisingly competitive clinical question answering is possible with commodity hardware and local models.

Ibrahim Ebrar Yurt, Fabian Karl, Tejaswi Choppa +1

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Mar 13, 2026

2w ago

Can Fairness Be Prompted? Prompt-Based Debiasing Strategies in High-Stakes Recommendations

You can boost fairness in LLM recommenders by up to 74% simply by prompting them to be fair, but watch out for unintended over-promotion of specific groups.

Mihaela Rotar, Theresia Veronika Rampisela, Maria Maistro

Constitutional AI & AI Ethics Natural Language Processing Recommendation & Information Retrieval

2w ago·also Northeastern, Shenzhen Loop Area Institute (SLAI);

LMEB: Long-horizon Memory Embedding Benchmark

Traditional text embedding benchmarks fail to capture the nuances of long-horizon memory retrieval, but this new benchmark reveals that bigger models don't always win, and performance on standard tasks doesn't guarantee success in complex, context-dependent memory scenarios.

Xinping Zhao, Xinshuo Hu, Jiaxin Xu +9

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

2w ago

NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval

Shrinking a 2B vision-language retriever to a 70M text-only model achieves 95% of the original quality and outperforms a 2B baseline, while slashing query latency by 50x.

Zhuchenyang Liu, Yao Zhang, Yu Xiao

Inference & Quantization Multimodal Models Recommendation & Information Retrieval

Mar 12, 2026

Te Zeng2w ago

Enhancing Music Recommendation with User Mood Input

Injecting user mood into music recommendation boosts perceived quality, proving that personalized listening experiences can be significantly improved by considering emotional state.

Te Zeng

Recommendation & Information Retrieval Speech & Audio

2w ago

KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation

GraphRAG, thought to be more robust to poisoning attacks due to its KG abstraction, is surprisingly vulnerable to KEPo, a novel attack that forges knowledge evolution paths to inject toxic events.

Qizhi Chen, Chaoyang Qi, Yihong Huang +5

Natural Language Processing Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

Complexity Science Hub2w ago·also Austrian Institute of Technology

Credibility Matters: Motivations, Characteristics, and Influence Mechanisms of Crypto Key Opinion Leaders

Crypto KOL credibility isn't just about credentials; it's a carefully performed balancing act between psychological needs, community expectations, and ethical self-regulation.

Alexander Kropiunig, Svetlana Kremer, Bernhard Haslhofer

Natural Language Processing Recommendation & Information Retrieval

2w ago

Test-Time Strategies for More Efficient and Accurate Agentic RAG

Agentic RAG systems can be made significantly more efficient and accurate simply by adding a contextualization module and de-duplicating retrieved documents at test time.

B. Zhang, Deepti Guntur, Zhiyang Zuo +7

Reasoning & Chain-of-Thought Recommendation & Information Retrieval Tool Use & Agents

Denys Katkalo +22w ago

Adapting Dijkstra for Buffers and Unlimited Transfers

A simple modification to Dijkstra, Transfer Aware Dijkstra (TAD), doubles the speed of public transit routing while correctly handling buffer times, outperforming state-of-the-art RAPTOR-based algorithms.

Denys Katkalo, Andrii Rohovyi, Toby Walsh

Recommendation & Information Retrieval

2w ago·also Tsinghua AI

QChunker: Learning Question-Aware Text Chunking for Domain RAG via Multi-Agent Debate

Forget brittle retrieval: QChunker uses a question-aware multi-agent debate to restructure RAG from retrieval-augmentation to *understanding*-retrieval-augmentation, boosting performance across diverse domains.

Jihao Zhao, Daixuan Li, Pengfei Li +3

Natural Language Processing Recommendation & Information Retrieval Tool Use & Agents