BITJoy Future AcademySenseTimeJun 15, 2026arXiv:2606.16409

PathRouter: Aligning Rewards with Retrieval Quality in Agentic Graph Retrieval-Augmented Generation

Bo Wang, Heyan Huang, Yaolin Li, Wei Tang, Yuan Zhang, Wenbo Li, Mingze Gao, Ge Shi, Chong Feng

AI Summary

This paper introduces PathRouter, a novel training framework for agentic Graph Retrieval-Augmented Generation (GraphRAG) that addresses the issues of answer-path reward aliasing and search-update ambiguity in reinforcement learning. By evaluating trajectories based on both answer correctness and evidence-path overlap, PathRouter effectively discourages shortcut reinforcement while promoting evidence-seeking behavior. Experimental results demonstrate that PathRouter significantly enhances answer F1 scores and evidence-path overlap across multiple QA benchmarks, achieving notable improvements in model performance.

Key Contribution

PathRouter reduces reliance on shortcuts in reinforcement learning, leading to more reliable and contextually rich decision-making in language-model agents.

Abstract

Agentic GraphRAG trains language-model agents to iteratively retrieve and reason over graph-structured evidence, enabling more accurate and context-aware decision-making by efficiently navigating complex information networks. However, outcome-only reinforcement learning suffers from \textit{\textbf{answer-path reward aliasing}}, where correct answers may come from shortcuts rather than useful evidence paths. It also exhibits \textit{\textbf{search-update ambiguity}}, as scalar trajectory-level feedback does not indicate which retrieval actions to adjust. To mitigate these shortcomings, we present PathRouter, a path-aware training framework for agentic GraphRAG. PathRouter jointly evaluates each trajectory along answer correctness and evidence-path overlap, yielding four trajectory categories with differentiated GRPO advantage scaling that suppresses shortcut reinforcement while preserving evidence-seeking behavior. For evidence-poor trajectories, a frozen gold-evidence teacher provides token-level KL guidance on reasoning and search-query tokens, excluding answer tokens to avoid direct response imitation. Experiments on six QA benchmarks across three model sizes show that PathRouter consistently improves answer F1 and evidence-path overlap, achieving average F1 gains of 3.1 on 3B and 4.9 on 7B models compared to a strong baseline.

Recommendation & Information Retrieval RLHF & Preference Learning Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

PathRouter: Aligning Rewards with Retrieval Quality in Agentic Graph Retrieval-Augmented Generation

Related Papers