May 1 – May 8, 2026

Recommendation & Information Retrieval - Weekly Roundup

47 papers published across 4 labs.

Selected Labs publishing this week

Top Papers

May 6, 2026

Universidad Autónoma de Madrid2w ago

MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education

Forget bulky atlases and unreliable image searches: MIRAGE offers medical students a free, interactive tool to retrieve, generate, and understand medical images using only open-source models.

Miguel Díaz Benito, Cecilia Diana-Albelda, Álvaro García-Martín +3

Data Curation & Synthetic Data Multimodal Models Recommendation & Information Retrieval

2w ago·also CUHK, HKU, University of California

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

OpenSearch-VL offers a fully transparent recipe for training state-of-the-art multimodal search agents, finally democratizing access to a capability previously locked behind closed doors.

Shuang Chen, Kaituo Feng, Hangting Chen +7

Multimodal Models Recommendation & Information Retrieval Tool Use & Agents

2w ago·also BAIR, Princeton

Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval

Forget scaling laws – the real bottleneck in associative memory isn't storage, it's retrieval: forcing a single "winner" costs you a logarithmic factor in capacity compared to allowing a ranked list.

Nicholas Barnfield, Juno Kim, Eshaan Nichani +2

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval

Andreas Pattichis +12w ago

Continual Knowledge Updating in LLM Systems: Learning Through Multi-Timescale Memory Dynamics

Forget rigid memory structures: Memini lets your LLM's external knowledge evolve organically, learning and forgetting like a brain.

Andreas Pattichis, Constantine Dovrolis

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Vasilis Perifanis +42w ago

Federated Learning for Early Prediction of EV Charging Demand

You can predict EV charging demand surprisingly well using only the first few minutes of a charging session, opening the door to real-time grid optimization.

Vasilis Perifanis, Foteini Nikolaidou, Nikolaos Pavlidis +2

Recommendation & Information Retrieval

All Papers (47)

May 6, 2026

2w ago·also CUHK, HKU, University of California

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

OpenSearch-VL offers a fully transparent recipe for training state-of-the-art multimodal search agents, finally democratizing access to a capability previously locked behind closed doors.

Shuang Chen, Kaituo Feng, Hangting Chen +7

Multimodal Models Recommendation & Information Retrieval Tool Use & Agents

2w ago·also BAIR, Princeton

Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval

Nicholas Barnfield, Juno Kim, Eshaan Nichani +2

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval

Andreas Pattichis +12w ago

Continual Knowledge Updating in LLM Systems: Learning Through Multi-Timescale Memory Dynamics

Forget rigid memory structures: Memini lets your LLM's external knowledge evolve organically, learning and forgetting like a brain.

Andreas Pattichis, Constantine Dovrolis

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Vasilis Perifanis +42w ago

Federated Learning for Early Prediction of EV Charging Demand

You can predict EV charging demand surprisingly well using only the first few minutes of a charging session, opening the door to real-time grid optimization.

Vasilis Perifanis, Foteini Nikolaidou, Nikolaos Pavlidis +2

Recommendation & Information Retrieval

Wenjing Liu +22w ago

A Biased Nonnegative Block Term Tensor Decomposition Model for Dynamic QoS Prediction

Overcome limitations in capturing complex user-service dependencies with a novel tensor decomposition method that significantly boosts QoS prediction accuracy.

Wenjing Liu, Yujia Lei, Qu Wang

Natural Language Processing Recommendation & Information Retrieval

Shereen Elsayed +32w ago

Rethinking Convolutional Networks for Attribute-Aware Sequential Recommendation

Ditch the attention: ConvRec proves convolutional networks can beat Transformers in sequential recommendation while slashing compute and memory costs.

Shereen Elsayed, N. Le, Ahmed Rashed +1

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval Training Efficiency & Optimization

2w ago·also Equal Core Contributions

LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

Forget dumb context stuffing: LongSeeker shows that strategically *editing* its own memory lets agents solve web search tasks with far greater reliability.

Yijun Lu, Rui Ye, Yuwen Du +3

Reasoning & Chain-of-Thought Recommendation & Information Retrieval Tool Use & Agents

Oracle Corporation2w ago

Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement

Forget relying on LLMs to judge themselves: this "Concept Field" approach uses vector math on text corpora to detect hallucinations and novelty, offering a fast, interpretable, and black-box alternative.

Nicholas S. Kersting, Vittorio Castelli, Chieh Ting Yeh +2

Natural Language Processing Recommendation & Information Retrieval

Joshua Adler +12w ago

Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall

Ditch the vector DB – this new agent architecture achieves SOTA memory recall by storing everything verbatim and optimizing retrieval, all in a single SQLite file.

Joshua Adler, Guy Zehavi

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval Tool Use & Agents

Stefano Cecconello +42w ago

From Beats to Breaches:How Offensive AI Infers Sensitive User Information from Playlists

Your innocent Spotify playlists are leaking surprisingly accurate predictions about your age, habits, and even personality traits, thanks to new AI attack.

Stefano Cecconello, Mauro Conti, Luca Pajola +2

Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness Speech & Audio

Siqiao Xue +62w ago

Beyond Retrieval: A Multitask Benchmark and Model for Code Search

Developer-style keyword searches completely nullify the advantage of even the best code embedding models, highlighting a critical gap in current code search techniques.

Siqiao Xue, Zihan Liao, Jin Qin +4

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Recommendation & Information Retrieval

Xinyi Li +72w ago

HeterSEED: Semantics-Structure Decoupling for Heterogeneous Graph Learning under Heterophily

HeterSEED achieves state-of-the-art performance on heterophilic heterogeneous graphs by decoupling semantic and structural information, offering a more robust approach than relying on feature similarity alone.

Xinyi Li, Ming Li, Lu Bai +5

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

2w ago·also Ant Group, PolyU

TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding

TabEmbed leapfrogs existing text embedding models to achieve SOTA performance on tabular data by reformulating tasks as semantic matching problems and using contrastive learning.

Minjie Qiang, Mingming Zhang, Xiaoyi Bao +5

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Institut Teknologi Sumatera South2w ago·also Department of Data Science Institut, Institut Teknologi Sumatera Lampung

Sentiment Analysis and Customer Satisfaction Prediction on E-Commerce Platforms Based on YouTube Comments Using the XGBoost Algorithm

E-commerce sentiment analysis is surprisingly influenced by socio-political terminology, impacting the accuracy of customer satisfaction prediction models.

Ridho Benedictus Togi Manik, Muhammad Aqil Ramadhan, Ihsan Maulana Yusuf +3

Natural Language Processing Recommendation & Information Retrieval

Corresponding Author2w ago

CHE-TKG: Collaborative Historical Evidence and Evolutionary Dynamics Learning for Temporal Knowledge Graph Reasoning

State-of-the-art temporal knowledge graph reasoning is now possible by jointly modeling historical evidence and evolutionary dynamics, unlocking complementary predictive signals.

Shuai Lei, Xiaobin Zhu, Jiarui Liang +3

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

ETH2w ago

Graph-Augmented LLMs for Swiss MP Ideology Prediction

Political ideology prediction gets a boost: injecting LLMs with knowledge graphs of MP relationships significantly improves accuracy.

Natural Language Processing Recommendation & Information Retrieval

Zhipeng Song +82w ago

CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation

RAG systems can be significantly improved by reranking documents based on how much they increase the LLM's confidence, not just their relevance.

Zhipeng Song, Yizhi Zhou, Xiangyu Kong +6

Natural Language Processing Recommendation & Information Retrieval

Zhenliang Zhang +62w ago

SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

Achieve 8x token reduction in million-token document understanding without sacrificing accuracy by having the LLM actively search for relevant information like a foraging animal.

Zhenliang Zhang, Wenqing Wang, Yong Hu +4

Reasoning & Chain-of-Thought Recommendation & Information Retrieval Tool Use & Agents

2w ago

DoGMaTiQ: Automated Generation of Question-and-Answer Nuggets for Report Evaluation

Stop hand-crafting QA datasets for evaluating RAG systems: DoGMaTiQ automates the process with surprisingly high correlation to human judgment, even across languages.

Bryan Li, W. Walden, Yu Hou +6

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

2w ago

How Does Chunking Affect Retrieval-Augmented Code Completion? A Controlled Empirical Study

Turns out, chunking code by function is the *worst* way to do retrieval-augmented code completion.

Xinjian Wu, Jingzhi Gong, Gunel Jahangirova +1

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Recommendation & Information Retrieval

Wenjun Yu +22w ago

One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving

Generative recommenders can slash latency by up to 38% simply by dynamically juggling GPU memory between embedding and KV caches, a feat current systems miss.

Wenjun Yu, Shuguang Han, Amelie Chi Zhou

Distributed Systems & Hardware Inference & Quantization Recommendation & Information Retrieval

Wenzhuo Cheng +62w ago

CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation

Generative recommendation gets a boost: CapsID's soft-routed semantic IDs outperform hard-quantized baselines and even rival sparse-dense hybrids, all while slashing inference latency by nearly half.

Wenzhuo Cheng, Menghang Gong, Qixin Guo +4

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Independent2w ago

AllSERP: Exhaustive Per-Element Enrichment of the Versatile AdSERP Dataset

Fine-grained analysis of user behavior on search engine results pages is now possible thanks to AllSERP, which adds exhaustive per-element annotations to the AdSERP dataset, covering organic results and widgets in addition to ads.

K. Andrew Edmonds

Computer Vision Data Curation & Synthetic Data Recommendation & Information Retrieval

2w ago·also ZJU

Beyond Static Best-of-N: Bayesian List-wise Alignment for LLM-based Recommendation

LLMs for recommendation can now surpass the limitations of static training signals, achieving sustained improvements in ranking accuracy, fairness, and diversity through a dynamically updated Bayesian distillation target.

Ruijun Chen, Chongming Gao, Jiawei Chen +2

Natural Language Processing Recommendation & Information Retrieval

UW2w ago·also PKU, SCU

Interests Burn-down Diffusion Process for Personalized Collaborative Filtering

Forget Gaussian noise - modeling the *decay* of user interest with a custom "burn-down" diffusion process unlocks better personalized recommendations.

Yifang Qin, Zhaobin Li, Arisa Watanabe +2

Recommendation & Information Retrieval

DAMO2w ago·also PolyU, SCU

RecGPT-Mobile: On-Device Large Language Models for User Intent Understanding in Taobao Feed Recommendation

On-device LLMs can now drive real-time recommendation improvements, unlocking faster adaptation to evolving user intent without cloud reliance.

Bin Zhang, Weipeng Huang, Dimin Wang +8

Inference & Quantization Natural Language Processing Recommendation & Information Retrieval

Universidad Autónoma de Madrid2w ago

MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education

Forget bulky atlases and unreliable image searches: MIRAGE offers medical students a free, interactive tool to retrieve, generate, and understand medical images using only open-source models.

Miguel Díaz Benito, Cecilia Diana-Albelda, Álvaro García-Martín +3

Data Curation & Synthetic Data Multimodal Models Recommendation & Information Retrieval

May 5, 2026

2w ago·also Antwerp

APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music

Aesthetic quality unlocks better generalization in AI-generated music popularity prediction, beating models trained solely on engagement metrics.

Jaavid Aktar Husain, Jaavid Aktar Husain, Dorien Herremans +1

Recommendation & Information Retrieval Speech & Audio

Yilun Zhao +52w ago

Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

Standard retriever evaluations hide critical weaknesses in agentic search systems, but a new benchmark and training method exposes and addresses these flaws.

Yilun Zhao, Jinbiao Wei, Tingyu Song +3

Reasoning & Chain-of-Thought Recommendation & Information Retrieval Tool Use & Agents

Qiyao Wang +132w ago

PatRe: A Full-Stage Office Action and Rebuttal Generation Benchmark for Patent Examination

LLMs struggle to navigate the complex, multi-turn justification and response dynamics of real-world patent examination, revealing critical gaps in legal reasoning and technical novelty judgment.

Qiyao Wang, Qiyao Wang, Xinyi Chen +11

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

2w ago

Reproducing Complex Set-Compositional Information Retrieval

Neural retrievers, despite their success on standard benchmarks, fail spectacularly when forced to reason about set-theoretic constraints, revealing a reliance on spurious correlations rather than true compositional understanding.

Vincent Degenhart, Dewi Timman, Arjen P. de Vries +2

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Kazan Federal University2w ago·also Automation and Information Technologies, Department of Automated Systems for Data, Department of Data Analysis and Programming, Dmukhtasibovich -Doctor of Physical and Mathematical +5

Natural Language Processing: A Comprehensive Practical Guide from Tokenisation to RLHF

Learn to build and evaluate your own NLP pipeline, from tokenization to RLHF, using open-weight models and reproducible research practices.

Mullosharaf K. Arabov

Natural Language Processing Recommendation & Information Retrieval RLHF & Preference Learning

Camilla Quaresmini +52w ago

Beyond Distributive Justice: Hermeneutical Fairness in Ad Delivery

Online advertising can harm users not just through unequal distribution of opportunities, but also by systematically depriving certain groups of relevant concepts or saturating them with skewed framings.

Camilla Quaresmini, Valentina Breschi, Jessica Leoni +3

Constitutional AI & AI Ethics Natural Language Processing Recommendation & Information Retrieval

Tejas D. Kulkarni +22w ago

Membership Inference Attacks for Retrieval Based In-Context Learning for Document Question Answering

Retrieval-augmented in-context learning, despite its benefits, leaks surprising amounts of private data, even when attackers only have access to paraphrased queries.

Tejas D. Kulkarni, Antti Koskela, Laith Zumot

Natural Language Processing Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

BAIR2w ago

MEMSAD: Gradient-Coupled Anomaly Detection for Memory Poisoning in Retrieval-Augmented Agents

Retrieval-augmented LLMs are surprisingly vulnerable to memory poisoning via synonym substitution, a loophole that gradient-based defenses can't close.

Ishrith Gowda

Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness Tool Use & Agents

H. Sedghani +32w ago

Decentralized Edge Caching under Budget and Storage Constraints: A Game-Theoretic Approach

Storage scarcity in edge caching doesn't just impact performance, it fundamentally shifts the economic landscape, amplifying inequality among content providers.

H. Sedghani, Zahra Seyedi, Mauro Passacantando +1

Distributed Systems & Hardware Recommendation & Information Retrieval

Sofiene Khiari +22w ago

AgenticPosesRanker: An Agentic AI Framework for Physically Grounded Ranking of Protein-Ligand Docking Poses

GPT-5, combined with physics-based tools, can match traditional scoring functions in ranking protein-ligand docking poses, opening avenues for interpretable curation in drug design.

Sofiene Khiari, Amr H. Mahmoud, Markus A. Lill

Recommendation & Information Retrieval Scientific Discovery & Drug Design Tool Use & Agents

Jayr Pereira +22w ago

Domain-Adaptive Dense Retrieval for Brazilian Legal Search

Fine-tuning dense retrievers on a mix of domain-specific and general question-answering data achieves surprisingly robust performance across diverse legal search tasks, outperforming models trained solely on legal data.

Jayr Pereira, Roberto A. Lotufo, L. Bonifacio

Data Curation & Synthetic Data Natural Language Processing Recommendation & Information Retrieval

Dong Chen +72w ago

Revisiting General Map Search via Generative Point-of-Interest Retrieval

LLMs can now directly generate relevant Point-of-Interest (POI) candidates for map search by encoding both semantic and geographic context, outperforming traditional retrieval methods.

Dong Chen, Shuai Zheng, Hao Shao +5

Natural Language Processing Recommendation & Information Retrieval

FIZ Karlsruhe2w ago·also NII, University of Göttingen

Aspect-Aware Content-Based Recommendations for Mathematical Research Papers

LLMs alone can't capture the nuances of mathematical research, but injecting aspect-aware information into a heterogeneous GNN unlocks surprisingly effective paper recommendations.

Ankit Satpute, André Greiner-Petter, Noah Gießing +4

Natural Language Processing Recommendation & Information Retrieval Scientific Discovery & Drug Design

Jing Qiu +22w ago

SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective Retrieval-Augmented Generation

RAG systems can now reduce unsafe answers by 37% using SURE-RAG, a transparent evidence verification method that outperforms even GPT-4o in controlled sufficiency tasks.

Jing Qiu, Zeyu Han, Chengen Huang

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Negar Arabzadeh +32w ago

RAG over Thinking Traces Can Improve Reasoning Tasks

RAG's reputation for being ineffective in reasoning tasks is shattered by showing that retrieving the right data – intermediate "thinking traces" – unlocks substantial performance gains, even for state-of-the-art models.

Negar Arabzadeh, Wenjie Ma, Sewon Min +1

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Venkata Krishna Prasanth Budigi +12w ago

Ditch the brittle RAG stack: a unified PostgreSQL data layer slashes latency by up to 92% and eliminates data leakage, making production RAG finally reliable.

Venkata Krishna Prasanth Budigi, Siri Chandana Sirigiri

Data Curation & Synthetic Data Recommendation & Information Retrieval

Dhruv Gulwani +32w ago

TeamUp: Semantic Project Matching and Team Formation for Learning at Scale

Manual student-to-project matching is dead: TeamUp forms better, more diverse teams at scale for pennies per student.

Dhruv Gulwani, Basem Suleiman, Aditya Joshi +1

Natural Language Processing Recommendation & Information Retrieval

May 2, 2026

2w ago·also HKU, Tencent AI

Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation

Forget sifting through walls of text – now you can pinpoint exactly where the AI found its answer, down to the pixel, even in complex visuals like charts and diagrams.

Peiyang Liu, Ziqiang Cui, Xi Wang +2

Computer Vision Multimodal Models Recommendation & Information Retrieval

May 1, 2026

Zi-qiang Zhao +13w ago

Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation

Tree-based RAG gets a major upgrade: $\Psi$-RAG's adaptive hierarchical index and multi-granular retrieval agent leapfrog existing methods on complex, cross-document reasoning tasks.

Zi-qiang Zhao, Menglin Yang

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Massimo Rondelli +23w ago

BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis

LLMs can now generate 70% syntactically correct and geometrically consistent 3D objects from text, thanks to retrieval-augmented code synthesis.

Massimo Rondelli, Francesco Pivi, Maurizio Gabbrielli

Code Generation & Program Synthesis Multimodal Models Recommendation & Information Retrieval