Search papers, labs, and topics across Lattice.
The paper introduces SemSIEdit, an inference-time framework employing an agentic "Editor" to iteratively critique and rewrite sensitive spans in LLM outputs, aiming to mitigate Semantic Sensitive Information (SemSI) leakage. They demonstrate a Privacy-Utility Pareto Frontier, achieving a 34.6% reduction in SemSI leakage with a 9.8% utility loss using this rewriting approach. The study also reveals a Scale-Dependent Safety Divergence where larger models enhance safety through constructive expansion, while smaller models rely on destructive truncation, and a Reasoning Paradox where reasoning increases both risk and defense efficacy.
LLMs face a Scale-Dependent Safety Divergence: larger reasoning models achieve safety by adding nuance, whereas capacity-constrained models revert to deleting text.
While defenses for structured PII are mature, Large Language Models (LLMs) pose a new threat: Semantic Sensitive Information (SemSI), where models infer sensitive identity attributes, generate reputation-harmful content, or hallucinate potentially wrong information. The capacity of LLMs to self-regulate these complex, context-dependent sensitive information leaks without destroying utility remains an open scientific question. To address this, we introduce SemSIEdit, an inference-time framework where an agentic "Editor" iteratively critiques and rewrites sensitive spans to preserve narrative flow rather than simply refusing to answer. Our analysis reveals a Privacy-Utility Pareto Frontier, where this agentic rewriting reduces leakage by 34.6% across all three SemSI categories while incurring a marginal utility loss of 9.8%. We also uncover a Scale-Dependent Safety Divergence: large reasoning models (e.g., GPT-5) achieve safety through constructive expansion (adding nuance), whereas capacity-constrained models revert to destructive truncation (deleting text). Finally, we identify a Reasoning Paradox: while inference-time reasoning increases baseline risk by enabling the model to make deeper sensitive inferences, it simultaneously empowers the defense to execute safe rewrites.