Search papers, labs, and topics across Lattice.
This paper introduces a probabilistic framework to analyze privacy leakage in generative AI agents that access enterprise data, focusing on the overlooked risk of revealing sensitive information from the datasets themselves. They define token-level and message-level differential privacy within this framework and derive privacy bounds based on generation parameters like temperature and message length. The authors then formulate a privacy-utility optimization problem to determine the optimal temperature settings for balancing privacy and utility.
Forget prompt privacy – your LLM's responses are leaking *enterprise data*, and this paper shows how to quantify and control it.
Large language models (LLMs) and AI agents are increasingly integrated into enterprise systems to access internal databases and generate context-aware responses. While such integration improves productivity and decision support, the model outputs may inadvertently reveal sensitive information. Although many prior efforts focus on protecting the privacy of user prompts, relatively few studies consider privacy risks from the enterprise data perspective. Hence, this paper develops a probabilistic framework for analyzing privacy leakage in AI agents based on differential privacy. We model response generation as a stochastic mechanism that maps prompts and datasets to distributions over token sequences. Within this framework, we introduce token-level and message-level differential privacy and derive privacy bounds that relate privacy leakage to generation parameters such as temperature and message length. We further formulate a privacy-utility design problem that characterizes optimal temperature selection.