Recommendation & Information Retrieval
ApplicationsSearch systems, recommendation engines, retrieval-augmented generation, dense retrieval, and ranking models.
Keywords
Top Labs in This Topic
Recent Papers
The paper introduces SAGEO Arena, a realistic evaluation environment for Search-Augmented Generative Engine Optimization (SAGEO) that addresses limitations of existing benchmarks by incorporating a full generative search pipeline over a large-scale corpus of web documents with rich structural information. They demonstrate that existing optimization approaches are often impractical and degrade performance in retrieval and reranking stages under realistic conditions. The study highlights the importance of structural information and stage-specific optimization for effective SAGEO.
Introduces SAGEO Arena, a novel benchmark environment enabling realistic, stage-level evaluation of search-augmented generative engine optimization strategies.
This paper introduces TopoFair, a benchmarking framework for fair link prediction that focuses on the impact of diverse topological biases beyond homophily. They formalize a taxonomy of topological bias measures and develop a graph generation method that allows for controlled variation of these biases while maintaining real-world graph characteristics. Through empirical evaluation of link prediction models, including fairness-aware methods, they demonstrate the sensitivity of fairness interventions to these structural biases.
Introduces a novel benchmarking framework, TopoFair, to analyze the interplay between topological biases and fairness in link prediction.
This paper studies bandit learning in two-sided matching markets where agents and firms conduct interviews to learn preferences. The authors introduce strategic deferral, allowing firms to delay hiring decisions and recover from suboptimal matches, and model interviews as low-cost hints that reveal partial preference information. They develop novel algorithms for centralized and decentralized settings that achieve time-independent regret, improving upon logarithmic regret bounds for learning stable matchings without interviews.
Introduces strategic deferral for firms in matching markets, enabling decentralized learning and recovery from suboptimal hires.
The paper introduces RELATE, a reinforcement learning framework for end-to-end advertising text generation that directly optimizes for conversion-oriented metrics and compliance constraints. RELATE integrates performance and compliance objectives into the text generation process via policy learning, moving beyond the traditional two-stage generation and alignment paradigm. Experiments on industrial datasets and online deployment show that RELATE significantly improves click-through conversion rate (CTCVR) while adhering to policy constraints.
Introduces an end-to-end reinforcement learning framework, RELATE, that unifies advertising text generation with conversion-oriented objective alignment and compliance constraints.
The paper introduces IncompeBench, a new benchmark for Music Information Retrieval (MIR) consisting of 1,574 permissively licensed music snippets, 500 diverse queries, and over 125,000 relevance judgements. This benchmark addresses the lack of high-quality evaluation datasets in MIR, enabling more rigorous and reproducible research. High inter-annotator agreement was achieved through a multi-stage annotation pipeline, ensuring data quality.
Provides IncompeBench, a permissively licensed, fine-grained benchmark dataset to facilitate advancements in music information retrieval.
This paper introduces a subword embedding approach to detect lexical and orthographic variation in user-generated text, specifically addressing the challenges of "noisy" and low-resource settings without relying on normalization or predefined variant lists. The method trains subword embeddings on raw Luxembourgish user comments and clusters related forms using a combination of cosine similarity and n-gram similarity. The results demonstrate the effectiveness of distributional modeling in uncovering meaningful patterns of variation, aligning with existing dialectal and sociolinguistic research.
Introduces a novel subword embedding method that automatically discovers and clusters lexical variations in user-generated text, even in low-resource languages, without requiring prior normalization or predefined variant lists.
The paper introduces RI-Mamba, a rotation-invariant state-space model for text-to-shape retrieval that addresses the limitations of existing methods in handling objects with arbitrary orientations and diverse categories. RI-Mamba disentangles pose from geometry using global and local reference frames and Hilbert sorting to create rotation-invariant token sequences. The model incorporates orientational embeddings via feature-wise linear modulation and employs cross-modal contrastive learning with automated triplet generation for scalable training, achieving state-of-the-art results on the OmniObject3D benchmark.
Introduces a novel rotation-invariant state-space model, RI-Mamba, for robust text-to-shape retrieval by disentangling pose from geometry and incorporating orientational embeddings.
The paper introduces ULTRA, a transformer-based recommendation architecture for Urdu, a low-resource language, to improve personalized news retrieval. ULTRA employs a dual-embedding architecture with a query-length aware routing mechanism to handle varying query lengths, directing queries to either title/headline-level or full-content pipelines. Experiments on a large Urdu news corpus demonstrate that ULTRA achieves over 90% precision compared to single-pipeline baselines, showing improved recommendation relevance.
Introduces a query-adaptive dual-embedding architecture for semantic content recommendation in low-resource languages, dynamically routing queries based on length to optimize retrieval relevance.
The paper introduces Multi-Level Compression Cross Networks (MLCC) and its multi-channel extension (MC-MLCC) to efficiently model high-order feature interactions in recommender systems. MLCC uses hierarchical compression and dynamic composition to capture feature dependencies with favorable computational complexity, while MC-MLCC decomposes feature interactions into parallel subspaces for efficient horizontal scaling. Experiments on public and industrial datasets demonstrate that MLCC and MC-MLCC outperform DLRM-style baselines, achieving up to 0.52 AUC improvement and up to 26x reduction in parameters and FLOPs, and the approach has been adopted in Bilibili's advertising system.
Introduces a novel feature interaction architecture, MLCC, that uses hierarchical compression and dynamic composition to efficiently capture high-order feature interactions, along with its multi-channel extension, MC-MLCC, for improved scalability.
The paper introduces CitiLink-Minutes, a novel multilayer dataset of 120 European Portuguese municipal meeting minutes from six municipalities, designed to address the lack of annotated datasets for NLP and IR research in this domain. The dataset features over one million tokens with de-identified personal information and includes manual annotations across metadata, subjects of discussion, and voting outcomes. Experiments demonstrate the dataset's utility for downstream tasks like metadata extraction, topic classification, and vote labeling, facilitating transparent access to municipal decisions.
Contributes CitiLink-Minutes, a unique multilayer annotated dataset of municipal meeting minutes, enabling NLP and IR research on local governance.
The paper investigates how to best pretrain small language models (SLMs) to decide which tokens to predict directly versus delegating to an external source via a special token. They find that loss alone is insufficient for determining optimal delegation, as some high-loss tokens represent acceptable alternative continuations. They introduce LaCy, a pretraining method that uses a spaCy grammar parser to augment the loss signal, enabling SLMs to learn when to delegate and resulting in improved FactScore in cascaded generation setups compared to other methods.
Introduces LaCy, a pretraining method that leverages a spaCy grammar parser to augment the loss signal, enabling SLMs to learn when to delegate token prediction to an external source.
The paper introduces "analytical search" as a new search paradigm tailored for complex analytical information needs, addressing the limitations of relevance-based ranking and retrieval-augmented generation (RAG) in tasks requiring trend analysis, causal inference, and verifiable conclusions. It proposes a system framework that integrates query understanding, recall-oriented retrieval, reasoning-aware fusion, and adaptive verification to support structured, multi-step inference. The authors argue that analytical search offers improved control over reasoning, evidence usage, and verifiability, leading to more accountable and utility-driven results compared to existing search paradigms.
Introduces and formalizes the concept of "analytical search" as a distinct search paradigm designed to address complex analytical information needs by emphasizing evidence-governed, process-oriented workflows.
The paper introduces SIGHT, a reinforcement learning framework designed to improve search-based reasoning in LLMs by mitigating redundancy and noise in search results. SIGHT uses Self-Evidence Support (SES) to distill search results into high-fidelity evidence and employs an Information Gain score to identify pivotal states for Dynamic Prompting Interventions like de-duplication and adaptive branching. By integrating SES and correctness rewards via Group Relative Policy Optimization, SIGHT achieves superior performance on single-hop and multi-hop QA benchmarks with fewer search steps compared to existing methods.
Introduces a novel reinforcement learning framework, SIGHT, that leverages self-evidence support and information-gain driven diverse branching to enhance search-based reasoning in LLMs.
The paper introduces AlphaPROBE, a novel framework for alpha factor mining in quantitative finance that represents the factor pool as a Directed Acyclic Graph (DAG) to capture the evolutionary relationships between factors. AlphaPROBE employs a Bayesian Factor Retriever to identify promising seed factors and a DAG-aware Factor Generator to produce context-aware and non-redundant optimizations based on the full ancestral trace of factors. Experiments on Chinese stock market datasets demonstrate that AlphaPROBE outperforms existing methods in predictive accuracy, return stability, and training efficiency by leveraging the global evolutionary topology.
Introduces a DAG-based framework for alpha factor mining that explicitly models the evolutionary relationships between factors to improve search efficiency and factor diversity.
This paper investigates the phenomenon of "token overflow" in soft compression architectures for retrieval-augmented generation (RAG), where compressed token representations lose task-relevant information. They propose a methodology to characterize and detect token overflow, evaluating it within the xRAG framework. Their key finding is that lightweight probing classifiers, leveraging both query and context xRAG representations, achieve an average AUC-ROC of 0.72 in detecting overflow across HotpotQA, SQuADv2, and TriviaQA datasets, demonstrating the importance of query-aware detection.
Introduces a methodology using lightweight probing classifiers to detect token overflow in compressed token representations for retrieval-augmented generation by leveraging query and context information.
This paper introduces a French-focused benchmark for PDF-to-Markdown conversion using VLMs, addressing the lack of evaluation datasets for non-English documents and the over-penalization of formatting variations in existing benchmarks. The benchmark consists of challenging French documents selected via model-disagreement sampling and is evaluated using unit-test-style checks targeting specific failure modes like text presence and reading order, combined with category-specific normalization. Results across 15 models show that proprietary models exhibit higher robustness on handwriting and forms, while open-weight models are competitive on standard layouts.
Introduces a new French-language PDF-to-Markdown benchmark with targeted unit tests and category-specific normalization to more accurately assess VLM performance in RAG pipelines.
The paper introduces Hi-SAM, a novel multi-modal recommendation framework designed to address limitations in semantic ID-based approaches, specifically suboptimal tokenization and architecture-data mismatch. Hi-SAM employs a Disentangled Semantic Tokenizer (DST) that uses geometry-aware alignment and coarse-to-fine quantization to separate shared and modality-specific semantics, and a Hierarchical Memory-Anchor Transformer (HMAT) that incorporates hierarchical positional encoding and anchor tokens to better model user-item interactions. Experiments on real-world datasets and a large-scale social platform demonstrate that Hi-SAM outperforms state-of-the-art baselines, particularly in cold-start scenarios, achieving a 6.55% improvement in a core online metric.
Introduces a hierarchical structure-aware multi-modal framework, Hi-SAM, that disentangles cross-modal semantics and modality-specific details during tokenization and incorporates hierarchical positional encoding within a transformer architecture for improved recommendation performance.
The paper introduces Meta-Sel, a supervised meta-learning approach for efficient demonstration selection in in-context learning, which addresses the challenge of selecting optimal few-shot examples under a limited prompt budget. Meta-Sel learns a scoring function based on TF-IDF cosine similarity and length-compatibility ratio between candidate demonstrations and queries, trained on a meta-dataset constructed from training data using class agreement as supervision. Empirical evaluation across four intent datasets and five LLMs demonstrates that Meta-Sel achieves competitive accuracy and selection-time overhead compared to 12 other demonstration selection methods, especially benefiting smaller models.
Introduces Meta-Sel, a lightweight supervised meta-learning approach that learns a fast, interpretable scoring function for selecting demonstrations for in-context learning.
The authors introduce IntTravel, a large-scale dataset with 4.1 billion interactions for integrated travel recommendation, addressing the limitations of existing datasets that focus solely on next POI recommendation. To leverage this dataset, they propose a decoder-only generative framework that balances task collaboration and differentiation through information preservation, selection, and factorization. Experiments demonstrate state-of-the-art performance on IntTravel and another benchmark dataset, with a successful deployment on Amap resulting in a 1.09% CTR increase.
Introduces a large-scale dataset, IntTravel, and a novel generative framework for integrated multi-task travel recommendation, demonstrating improved performance and real-world impact.
The paper introduces LASER, a full-stack optimization framework for efficient long sequence modeling in recommendation systems, addressing I/O and computational bottlenecks. LASER incorporates SeqVault, a hybrid DRAM-SSD indexing strategy, to reduce retrieval latency, and Segmented Target Attention (STA), a novel attention mechanism with a sigmoid-based gating strategy and Global Stacked Target Attention (GSTA), to reduce computational complexity. Online A/B testing showed LASER achieved significant improvements in ADVV and revenue, demonstrating its practical impact.
Introduces a full-stack optimization framework, LASER, featuring SeqVault and Segmented Target Attention (STA), to achieve efficient long sequence modeling for recommendation systems.
This paper investigates the use of Time Series Foundation Models (TSFMs) for forecasting commencing student enrollments in data-sparse higher education settings. The authors introduce the Institutional Operating Conditions Index (IOCI), a novel covariate derived from time-stamped documentary evidence, and combine it with Google Trends data to improve forecast accuracy. Results from an expanding-window backtest demonstrate that covariate-conditioned TSFMs achieve performance comparable to classical benchmarks without institution-specific training, highlighting their potential for zero-shot enrollment forecasting.
Introduces the Institutional Operating Conditions Index (IOCI), a transferable covariate derived from documentary evidence, to enhance TSFM-based enrollment forecasting in data-sparse environments.
The paper introduces a query-focused and memory-aware reranking framework that leverages attention scores from selected heads in large language models to estimate passage-query relevance in a listwise manner. This approach generates continuous relevance scores, allowing training on diverse retrieval datasets and capturing holistic information from the candidate shortlist. Experiments show the method outperforms existing pointwise and listwise rerankers on Wikipedia, long narrative datasets, and the LoCoMo benchmark, achieving state-of-the-art results.
Introduces a novel reranking framework that utilizes attention scores from specific heads to estimate passage-query relevance in a listwise, memory-aware fashion.
The authors introduce RokomariBG, a large-scale, multi-entity heterogeneous book graph dataset for personalized Bangla book recommendation, addressing the lack of resources in this low-resource language setting. They construct a knowledge graph comprising books, users, authors, categories, publishers, and reviews connected through eight relation types. Through benchmarking experiments on Top-N recommendation using collaborative filtering, matrix factorization, content-based methods, graph neural networks, and neural retrieval models, they demonstrate the dataset's utility and the importance of leveraging multi-relational structure and textual side information, achieving an NDCG@10 of 0.204 with neural retrieval models.
Introduces RokomariBG, a novel large-scale, multi-entity heterogeneous graph dataset for Bangla book recommendation, complete with benchmarking experiments.
The paper introduces Hydra, a repository-level code generation framework that moves away from treating code as natural language and instead leverages its structured nature. Hydra employs a structure-aware indexing strategy using hierarchical trees, a dependency-aware retriever (DAR) to identify true dependencies, and a hybrid retrieval mechanism. Experiments on DevEval and RepoExec benchmarks demonstrate that Hydra achieves state-of-the-art performance, surpassing existing methods by over 5% in Pass@1 and enabling smaller models to outperform larger ones.
Introduces a novel repository-level code generation framework, Hydra, that leverages structure-aware indexing and dependency-aware retrieval to improve performance on complex code generation tasks.
This paper investigates the influence of team dynamics on OSS project selection by surveying 198 OSS practitioners. The study reveals that communication-related team dynamics like responsiveness and clarity are consistently prioritized, but the relative importance varies based on contributor motivations such as gaining reputation or networking. The findings demonstrate that aligning team dynamics with contributor motivations is crucial for understanding project selection behavior and designing better project recommendation systems.
Empirically demonstrates that team dynamics, particularly communication-related aspects, significantly influence OSS project selection, with the relative importance of specific dynamics varying based on contributor motivations.
The paper addresses the cold-start problem in bundle recommendation by proposing EpicCBR, a multi-view contrastive learning framework that leverages user-item (UI) and bundle-item (BI) relations. EpicCBR constructs user profiles by mining item relations and characterizes new bundles using historical bundle information and user preferences. Experiments on three benchmarks demonstrate that EpicCBR significantly outperforms state-of-the-art methods, achieving up to 387% improvement in cold-start scenarios.
Introduces a novel item-relation-enhanced dual-scenario contrastive learning framework (EpicCBR) to improve cold-start bundle recommendation by explicitly modeling user-item and bundle-item relationships.
The paper introduces Rec2PM, a generative recommendation framework that compresses long user interaction histories into compact Preference Memory tokens to address the computational cost and noise accumulation challenges of full-attention models. Rec2PM uses a self-referential teacher-forcing strategy, generating reference memories from a global history view to supervise parallelized recurrent updates, enabling fully parallel training and iterative updates during inference. Experiments on large-scale benchmarks demonstrate that Rec2PM achieves superior accuracy with reduced inference latency and memory footprint, functioning as a denoising Information Bottleneck.
Introduces a novel self-referential teacher-forcing strategy for training recurrent preference memory in generative recommendation, enabling parallel training and efficient long-sequence modeling.
The paper introduces DiffusionRank, a novel generative learning-to-rank (LTR) approach based on denoising diffusion that models the joint distribution of feature vectors and relevance labels. This contrasts with traditional discriminative LTR methods that model the conditional probability of relevance given features. By learning the full data distribution, DiffusionRank aims to produce more robust ranking models, achieving significant improvements over discriminative counterparts.
Introduces DiffusionRank, a denoising diffusion-based generative model for learning-to-rank that outperforms discriminative methods.
The paper introduces a RAG-pipeline and two-layer prompting strategy to extract actionable recommendations (ReACTs) for improving OSS sustainability from software engineering literature. They systematically explore open LLMs and prompting techniques to derive candidate ReACTs from ICSE and FSE papers, followed by a filtering and refinement stage to ensure quality and extract supporting evidence. The pipeline generates 1,922 ReACTs, with 1,312 meeting strict quality criteria, providing a structured and scalable approach to translate research findings into practical guidance for OSS projects.
Introduces a novel RAG-pipeline leveraging LLMs to extract and structure evidence-based, actionable recommendations (ReACTs) from software engineering literature for improving OSS project sustainability.
This paper extends the Quantified Boolean Bayesian Network (QBBN) to incorporate negation and backward reasoning, completing Prawitz's simple elimination rules within a probabilistic factor graph framework. It introduces a typed logical language with role-labeled predicates and modal quantifiers, along with a typed slot grammar that deterministically compiles sentences to logical form. The authors demonstrate that while LLMs can assist in disambiguation, grammars are essential for structured parsing, and the QBBN architecture leverages LLMs for annotation and verification in logical information retrieval.
Introduces a complete logical information retrieval system combining LLMs, typed slot grammars, and a QBBN inference engine to reconcile formal semantics with modern language models.
The paper introduces P-GenRM, a personalized generative reward model that addresses limitations in existing personalized reward models by transforming preference signals into structured evaluation chains to derive adaptive personas and scoring rubrics. P-GenRM clusters users into User Prototypes and employs a dual-granularity scaling mechanism, scaling at both the individual and prototype levels to mitigate noise and enhance generalization. Experiments demonstrate state-of-the-art results on personalized reward model benchmarks, with a 2.31% average improvement and a 3% boost from test-time user-based scaling, indicating stronger personalized alignment.
Introduces a personalized generative reward model (P-GenRM) that leverages structured evaluation chains and dual-granularity scaling to improve personalization and generalization in reward modeling for LLMs.
The paper introduces AttentionRetriever, a novel retrieval model designed for long documents that addresses context-awareness, causal dependence, and scope of retrieval limitations in existing RAG systems. AttentionRetriever leverages attention mechanisms and entity-based retrieval to create context-aware embeddings for long documents and determine the relevant retrieval scope. Experiments demonstrate that AttentionRetriever significantly outperforms existing retrieval models on long document retrieval datasets while maintaining the efficiency of dense retrieval methods.
Introduces AttentionRetriever, a novel long document retrieval model using attention and entity-based retrieval to create context-aware embeddings.
The paper addresses the "uncertainty blindness" limitation in generative recommendation models, where models treat all outcomes as equally certain, leading to unstable training and unquantifiable risks. They introduce Uncertainty-aware Generative Recommendation (UGR), a framework that incorporates uncertainty as a signal for adaptive optimization via uncertainty-weighted rewards, difficulty-aware optimization, and explicit confidence alignment. Experiments show that UGR improves recommendation performance, stabilizes training, and enables risk-aware applications.
Introduces a unified framework, UGR, that leverages uncertainty signals to improve generative recommendation by adaptively optimizing training based on model confidence, sample difficulty, and explicit confidence alignment.
The paper introduces a reinforcement learning-based web crawling algorithm, SB-CLASSIFIER, designed to efficiently acquire statistical datasets (SDs) from websites. The algorithm addresses the challenge of inefficient or impossible SD retrieval at scale by learning which hyperlinks lead to pages that link to many targets, based on the paths leading to the links in their enclosing webpages. Experiments on large websites demonstrate that SB-CLASSIFIER can retrieve a high fraction of a site's targets while crawling only a small part of the website.
Introduces a novel reinforcement learning-based web crawler, SB-CLASSIFIER, that leverages sleeping bandits to efficiently identify and extract statistical datasets from large websites.
The authors introduce KuaiSearch, a large-scale e-commerce search dataset derived from Kuaishou user interactions, designed to address limitations in existing datasets such as anonymization and single-stage coverage. KuaiSearch includes authentic user queries, natural product texts, and covers cold-start users/long-tail products across recall, ranking, and relevance stages of the search pipeline. Through comprehensive analysis and benchmark experiments, the authors demonstrate KuaiSearch's value for advancing research in real-world e-commerce search, particularly for LLM-based approaches.
Introduces KuaiSearch, a novel large-scale e-commerce search dataset built from real-world Kuaishou user interactions spanning recall, ranking, and relevance stages.
This paper investigates whether online linear optimization (OLO) algorithms are sufficient for achieving strategic robustness in repeated Bayesian first-price auctions. The authors demonstrate that sublinear linearized regret in OLO is sufficient for strategic robustness, enabling the construction of strategically robust no-regret bidding algorithms via black-box reductions. Their reductions yield improved regret bounds compared to prior work, achieving $O(\sqrt{T \log K})$ regret in the known value distribution case and $O(\sqrt{T (\log K+\log(T/\delta)})$ regret in the unknown case, while also removing a bounded density assumption.
Establishes that sublinear linearized regret in online linear optimization is sufficient for achieving strategic robustness in repeated Bayesian first-price auctions, enabling black-box reductions to strategically robust bidding algorithms.
This paper addresses the challenge of adapting general-purpose Vision-Language Models (VLMs) to the specific demands of e-commerce product understanding, characterized by attribute-centric data, multiple images, and noise. The authors demonstrate that targeted adaptation of VLMs can significantly enhance e-commerce performance without compromising general multimodal capabilities. They also introduce a new evaluation suite designed for deep product understanding, instruction following, and dynamic attribute extraction.
Demonstrates a strategy for adapting general-purpose VLMs to e-commerce data that improves performance on product understanding tasks while maintaining general multimodal capabilities.
This paper introduces an attribution-guided query rewriting method to improve the robustness of neural retrievers when faced with underspecified or ambiguous queries. The approach computes gradient-based token attributions from the retriever to identify problematic query components and then uses these attributions to guide an LLM in rewriting the query. Experiments on BEIR collections demonstrate that this method consistently improves retrieval effectiveness compared to existing query rewriting and explainability-based techniques, especially for implicit or ambiguous information needs.
Introduces an attribution-guided query rewriting framework that leverages retriever feedback to improve query clarity and retrieval effectiveness.
This paper introduces MemFly, a framework for on-the-fly memory optimization in LLMs based on the information bottleneck principle. MemFly uses a gradient-free optimizer to minimize compression entropy while maximizing relevance entropy, creating a stratified memory structure. The framework incorporates a hybrid retrieval mechanism combining semantic, symbolic, and topological pathways, achieving superior performance in memory coherence, response fidelity, and accuracy compared to existing methods.
Introduces an information bottleneck-based framework, MemFly, for on-the-fly memory optimization in LLMs, enabling efficient compression and precise retrieval.
The paper introduces DREAM, a multi-round debate framework using LLM agents with opposing stances and iterative critique, to address the problem of incomplete relevance labels in IR benchmarks. DREAM achieves 95.2% labeling accuracy with only 3.5% human involvement by using agreement-based debate for accurate labeling and reliable AI-to-human escalation for uncertain cases. Using DREAM, the authors construct BRIDGE, a refined benchmark with 29,824 newly identified relevant chunks, demonstrating that incomplete labels distort retriever rankings and retrieval-generation alignment.
Introduces a multi-agent debate framework, DREAM, that leverages opposing LLM agents and iterative critique to improve the accuracy and scalability of relevance assessment for IR benchmarks.
The paper introduces AdNanny, a unified reasoning-centric LLM fine-tuned from a 671B DeepSeek-R1 checkpoint for various offline advertising tasks. They construct reasoning-augmented corpora with structured supervision and natural language explanations, and then use multi-task supervised fine-tuning with adaptive reweighting followed by reinforcement learning to align with online advertising objectives. Deployed in Bing Ads, AdNanny reduces manual labeling effort and improves accuracy, demonstrating a scalable and cost-effective solution by consolidating task-specific models.
The paper demonstrates that a single, reasoning-centric LLM, AdNanny, can effectively replace multiple task-specific models for offline advertising tasks, leading to improved accuracy and reduced manual effort.
The paper introduces LR-bench, a new benchmark for reviewer assignment comprising 1055 expert-annotated paper-reviewer-score annotations from 2024-2025 AI/NLP manuscripts with five-level self-assessed familiarity ratings. It then proposes RATE, a reviewer-centric ranking framework that distills reviewer publications into keyword profiles and fine-tunes an embedding model using weak supervision from heuristic retrieval signals. Experiments on LR-bench and the CMU dataset demonstrate that RATE achieves state-of-the-art performance compared to strong embedding baselines.
Introduces a novel reviewer-centric ranking framework, RATE, that leverages keyword-based reviewer profiles and weak supervision to improve reviewer assignment.
This paper surveys recent advances in applying deep learning to information systems, contrasting them with classical pattern recognition techniques for text. It highlights the use of large language models and transformer architectures like BERT in digital assistants and various NLP tasks. The review covers post-training alignment, parsing, and reinforcement learning techniques used to improve these systems.
Synthesizes recent progress in applying deep learning, particularly large language models and transformers, to a range of information system tasks, providing context through classical pattern recognition methods.
This paper introduces a framework to study how source preferences influence LLMs' resolution of knowledge conflicts in retrieval-augmented generation. The authors evaluate 13 open-weight LLMs and find that they generally favor institutionally-corroborated information (e.g., government, newspapers) over information from people and social media, but this preference can be overridden by repetition. They propose a novel method to reduce repetition bias, achieving up to 99.8% reduction while maintaining at least 88.8% of the original source preferences.
Introduces a novel framework and method to analyze and mitigate repetition bias in LLMs' source preferences when resolving knowledge conflicts.
The paper introduces Dynamic Tool Dependency Retrieval (DTDR), a retrieval method that conditions on both the initial query and the evolving execution context to address the limitations of static tool retrieval in function calling agents. DTDR models tool dependencies from function calling demonstrations, enabling adaptive retrieval as plans unfold and improving the selection of relevant tools. Experiments across multiple datasets and LLM backbones demonstrate that DTDR significantly improves function calling success rates, achieving gains between 23% and 104% compared to static retrievers.
Introduces a dynamic tool retrieval mechanism that leverages evolving execution context to model tool dependencies and improve function calling accuracy.
This paper presents a systematic review and meta-analysis of 98 publications from 2023-2025 to analyze the architectures and performance of generative AI-powered teaching assistants, focusing on Retrieval-Augmented Generation (RAG) systems. The study examines the fusion of Transformer-based LLMs and RAG across theory, architecture, mechanism, and application, identifying key technical improvement directions like domain knowledge base construction and hybrid retrieval optimization. Meta-analysis reveals that RAG-enhanced systems achieve significantly higher accuracy (87.3%) and learning effectiveness (Cohen's d = 0.68) compared to pure generative models, with Transformer and RAG integration becoming dominant architectures.
Systematically analyzes the architectural evolution and performance of RAG-enhanced generative AI systems in educational question-answering, quantifying the benefits of RAG and identifying key areas for future improvement.
This paper introduces FCDP, a credit default prediction model that combines an Enhanced Transformer module (ETransformer) for efficient feature filtering and long-range modeling, an Attention Guidance Prediction Module (AGPM) to enhance feature representation and suppress deep feature loss, and a Channel Attention Module (CAM) to learn channel importance. The model addresses limitations in existing credit default prediction research, such as reliance on manual feature engineering and insufficient feature extraction. Experiments on the Lending Club dataset demonstrate that FCDP outperforms six other forecasting models, suggesting its potential for improved risk assessment.
Introduces a novel credit default prediction model (FCDP) that integrates an Enhanced Transformer, Attention Guidance Prediction Module, and Channel Attention Module to improve prediction accuracy and computational efficiency.
The paper introduces CoSense-LLM, an edge-first framework that converts multimodal sensor data into semantic tokens and coordinates with LLMs while considering latency, energy, bandwidth, and privacy constraints. CoSense-LLM employs a lightweight encoder (SenseFusion), edge-based retrieval (Edge-RAG), cost-aware prompt routing, and secure execution to minimize data transmission and ensure privacy. Experiments across diverse environments demonstrate that CoSense-LLM achieves sub-second latency, reduces bandwidth costs through local retrieval, and preserves privacy by transmitting only discrete codes.
Introduces an edge-first framework, CoSense-LLM, that enables efficient and privacy-preserving integration of multimodal sensor data with large language models under resource constraints.
This paper introduces a post-tool execution reflection mechanism that leverages LLM-based reflection and domain-specific RAG to repair failed tool calls in agentic systems. The approach uses a combination of tool-specific documentation and troubleshooting documents to identify and correct both syntactic and semantic errors that are only apparent after the tool's response is analyzed. Experiments using the kubectl command-line tool for Kubernetes management demonstrate that the RAG-based reflection improves the execution pass rate by 55% and the correctness of answers to user queries by 36% on average, with troubleshooting documents outperforming official documentation.
Introduces a novel post-tool execution reflection component that combines LLM-based reflection with domain-specific RAG to improve the reliability and accuracy of tool calls in agentic systems.
This paper introduces SQUARE, a training-free zero-shot composed image retrieval (ZS-CIR) framework that uses multimodal large language models (MLLMs) to improve retrieval accuracy. SQUARE employs Semantic Query-Augmented Fusion (SQAF) to enrich the query embedding with MLLM-generated captions, providing high-level semantic guidance. It also uses Efficient Batch Reranking (EBR), where an MLLM jointly reasons about top-ranked candidates presented as an image grid to refine the ranking in a single pass.
Introduces a two-stage training-free ZS-CIR framework, SQUARE, that leverages MLLMs for semantic query augmentation and efficient batch reranking to improve retrieval accuracy without task-specific training.

