Search papers, labs, and topics across Lattice.
This paper introduces SPARK, an inference-time security harness designed to enhance the security of code generated by large language models (LLMs) without requiring retraining. By leveraging latent security knowledge embedded in pretraining corpora, SPARK activates this knowledge through a structured cue and a precomputed token bias, effectively addressing the common issue of exploitable security flaws in generated code. Evaluated across multiple programming languages and models, SPARK consistently matches or exceeds the performance of existing fine-tuning and retrieval-augmented methods while maintaining utility in code generation tasks.
Activating latent security knowledge in LLMs can significantly reduce exploitable vulnerabilities in generated code without the overhead of retraining.
Large language models routinely generate code with exploitable security flaws. Prior literature attributes this limitation to a lack of security expertise, steering current defense mechanisms toward heavy fine-tuning or external knowledge retrieval, which introduces significant computational overhead and data bias through redundant code examples. Contrary to this view, we argue that pretraining corpora are already rich in security material. The bottleneck is activation: without an explicit and brief cue, statistical pressure toward common training-distribution patterns suppresses the model's safety-relevant representations. We present SPARK, an inference-time security harness that activates this latent knowledge without any retraining. The harness has two parts. Component~I retrieves a few of the relevant Common Weakness Enumeration (CWE) entries for each coding task and appends a short structured cue to the prompt; this alone is enough to surface the model's existing security representations. Component~II adds a precomputed token bias to the logits at every decoding step. We obtain the bias by projecting a safe-direction vector, the unit difference between the mean safe and mean unsafe last-layer hidden states, through the language model head. The bias is computed once offline; applying it costs a single vector addition per generated token. We evaluate SPARK on 9 open-source models across C++, Java, and Python, and compare with 7 baselines spanning fine-tuning and retrieval-augmented methods. SPARK matches or improves on the best baseline in every setting while preserving HumanEval utility. We further test Component~I in a black-box setting on 7 of today's strongest models, including Claude, DeepSeek, and GPT, demonstrating the bottleneck of insecure code generation and the improvements enabled by our method.