Search papers, labs, and topics across Lattice.
This paper introduces Prompt Control-Flow Integrity (PCFI), a runtime defense mechanism against prompt injection attacks that leverages the structured composition of prompts into system, developer, user, and retrieved-document segments. PCFI employs a three-stage middleware pipeline involving lexical heuristics, role-switch detection, and hierarchical policy enforcement to filter malicious requests. Evaluation on a custom benchmark demonstrates that PCFI intercepts all attack-labeled requests with a 0% False Positive Rate and minimal processing overhead (0.04 ms median).
Stop prompt injections cold: PCFI's priority-aware runtime defense intercepts all attacks in testing with zero false positives and negligible overhead.
Large language models (LLMs) deployed behind APIs and retrieval-augmented generation (RAG) stacks are vulnerable to prompt injection attacks that may override system policies, subvert intended behavior, and induce unsafe outputs. Existing defenses often treat prompts as flat strings and rely on ad hoc filtering or static jailbreak detection. This paper proposes Prompt Control-Flow Integrity (PCFI), a priority-aware runtime defense that models each request as a structured composition of system, developer, user, and retrieved-document segments. PCFI applies a three-stage middleware pipeline, lexical heuristics, role-switch detection, and hierarchical policy enforcement, before forwarding requests to the backend LLM. We implement PCFI as a FastAPI-based gateway for deployed LLM APIs and evaluate it on a custom benchmark of synthetic and semi-realistic prompt-injection workloads. On the evaluated benchmark suite, PCFI intercepts all attack-labeled requests, maintains a 0% False Positive Rate, and introduces a median processing overhead of only 0.04 ms. These results suggest that provenance- and priority-aware prompt enforcement is a practical and lightweight defense for deployed LLM systems.