IDSIAJilinKAUSTNNAISENSEZJUJun 15, 2026arXiv:2606.16821

How Much Can We Trust LLM Search Agents? Measuring Endorsement Vulnerability to Web Content Manipulation

Yimeng Chen, Zhe Ren, Firas Laakom, Yu Li, Dandan Guo, Jürgen Schmidhuber

AI Summary

This study introduces SearchGEO, a framework designed to evaluate the vulnerability of LLM-based search agents to endorsement corruption caused by manipulated web content. By analyzing 13 different LLM backends across 308 cases, the authors reveal significant variability in attack success rates, with some models like Claude-Sonnet-4.6 showing no vulnerability while others like Gemini-3-Flash exhibit a 31.4% success rate. The findings highlight critical differences in how various models handle endorsement under adversarial conditions, suggesting that recommendation reliability should be prioritized in safety assessments of LLMs.

Key Contribution

LLM search agents can be easily manipulated, with endorsement corruption rates varying dramatically across different models, raising serious concerns about their reliability.

Abstract

Large language model (LLM)-based search agents synthesize open-web content into actionable recommendations on behalf of users, creating a risk that attacker-published pages are transformed into endorsed claims. We introduce SearchGEO, a controlled evaluation framework for measuring endorsement corruption in LLM-based web-search agents, combining a web-evidence manipulation pipeline, a five-mode attack taxonomy, and multiple output-level metrics. We evaluate 13 LLM backends on 308 cases each. Results show that vulnerability patterns vary across backends: overall attack success rate (ASR) ranges from 0.0% on Claude-Sonnet-4.6 to 31.4% on Gemini-3-Flash, the strongest attack mode differs by model family, and the same deployment scaffold could amplify or decrease ASR on different backends. An auxiliary agent-skill probe, where endorsement becomes an install command, exposes a sharp split among otherwise robust backends: Claude over-rejects while GPT over-trusts. These findings argue for treating recommendation reliability under adversarial search content as a first-class dimension of backend safety evaluation.

Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...