MilaCapital OneW1H” paradigmApr 27, 2026arXiv:2604.24608

Learning to Route Queries to Heads for Attention-based Re-ranking with Large Language Models

Yuxing Tian, Fengran Mo, Zhiqi Huang, Weixu Zhang, Jian-Yun Nie

AI Summary

The paper introduces RouteHead, a query-dependent head selection method for attention-based re-ranking with LLMs that addresses the limitations of static head selection or naive aggregation. They train a lightweight router to map queries to optimal head sets based on pseudo-labels generated via offline search and a sparsity regularizer. Experiments across benchmarks and LLMs demonstrate RouteHead consistently outperforms strong baselines by selectively aggregating attention signals from the most informative heads for each query.

Key Contribution

LLMs re-rank documents better when you learn to route each query to the specific attention heads that matter, instead of relying on static subsets or everything at once.

Abstract

Large Language Models (LLMs) have recently been explored as fine-grained zero-shot re-rankers by leveraging attention signals to estimate document relevance. However, existing methods either aggregate attention signals across all heads or rely on a statically selected subset identified by heuristic rules. This solution can be suboptimal because the informative heads can vary across queries or domains. Moreover, naively combining multiple heads can degrade performance due to redundancy or conflicting ranking signals. In this paper, we propose a query-dependent head selection method, RouteHead, for attention-based re-ranking with LLMs. Specifically, we learn a lightweight router that can map each query to an optimal head set, and relevance scores are computed by aggregating attention signals only from these heads. Since query-to-head optimal labels are unavailable, we first construct pseudo labels via an offline search. The router represents each head with a learnable embedding and represents each query using an embedding extracted from the hidden states of the frozen LLM. Then it is trained on the pseudo labels with a sparsity regularizer. Experiments on diverse benchmarks and multiple LLM backbones show that the proposed method consistently outperforms strong baselines.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References53

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Learning to Route Queries to Heads for Attention-based Re-ranking with Large Language Models

Related Papers