Search papers, labs, and topics across Lattice.
This paper investigates the vulnerability of LLM rankers to jailbreak prompt injection attacks across various LLM families, architectures, and ranking paradigms. The study quantifies vulnerability using attack success rate (ASR) and nDCG@10, evaluating pairwise, listwise, and setwise ranking approaches under decision objective and criteria hijacking. The key finding is that encoder-decoder architectures demonstrate greater resilience to these attacks compared to other architectures, highlighting architectural differences in vulnerability.
Encoder-decoder LLMs exhibit surprisingly strong inherent resilience to jailbreak prompt injection attacks when used as rankers, challenging the assumption that all LLMs are equally vulnerable.
Large Language Models (LLMs) have emerged as powerful re-rankers. Recent research has however showed that simple prompt injections embedded within a candidate document (i.e., jailbreak prompt attacks) can significantly alter an LLM's ranking decisions. While this poses serious security risks to LLM-based ranking pipelines, the extent to which this vulnerability persists across diverse LLM families, architectures, and settings remains largely under-explored. In this paper, we present a comprehensive empirical study of jailbreak prompt attacks against LLM rankers. We focus our evaluation on two complementary tasks: (1) Preference Vulnerability Assessment, measuring intrinsic susceptibility via attack success rate (ASR); and (2) Ranking Vulnerability Assessment, quantifying the operational impact on the ranking's quality (nDCG@10). We systematically examine three prevalent ranking paradigms (pairwise, listwise, setwise) under two injection variants: decision objective hijacking and decision criteria hijacking. Beyond reproducing prior findings, we expand the analysis to cover vulnerability scaling across model families, position sensitivity, backbone architectures, and cross-domain robustness. Our results characterize the boundary conditions of these vulnerabilities, revealing critical insights such as that encoder-decoder architectures exhibit strong inherent resilience to jailbreak attacks. We publicly release our code and additional experimental results at https://github.com/ielab/LLM-Ranker-Attack.