Search papers, labs, and topics across Lattice.
This paper introduces LCGuard, a framework to mitigate sensitive information leakage in multi-agent systems that communicate via shared key-value (KV) caches. LCGuard learns representation-level transformations of KV caches before sharing them, balancing task performance with the reduction of reconstructable sensitive information. Through adversarial training, LCGuard minimizes the ability of an adversary to reconstruct agent-specific sensitive inputs from the shared KV caches, demonstrating improved safety without significant performance degradation across various models and benchmarks.
Sharing key-value caches in multi-agent LLM systems leaks sensitive agent information, but LCGuard can protect it with representation-level transformations.
Large language model (LLM)-based multi-agent systems increasingly rely on intermediate communication to coordinate complex tasks. While most existing systems communicate through natural language, recent work shows that latent communication, particularly through transformer key-value (KV) caches, can improve efficiency and preserve richer task-relevant information. However, KV caches also encode contextual inputs, intermediate reasoning states, and agent-specific information, creating an opaque channel through which sensitive content may propagate across agents without explicit textual disclosure. To address this, we introduce \textbf{LCGuard} (Latent Communication Guard), a framework for safe KV-based latent communication in multi-agent LLM systems. LCGuard treats shared KV caches as latent working memory and learns representation-level transformations before cache artifacts are transmitted across agents. We formalize representation-level sensitive information leakage operationally through reconstruction: a shared cache artifact is unsafe if an adversarial decoder can recover agent-specific sensitive inputs from it. This leads to an adversarial training formulation in which the adversary learns to reconstruct sensitive inputs, while LCGuard learns transformations that preserve task-relevant semantics and reduce reconstructable information. Empirical evaluations across multiple model families and multi-agent benchmarks show that LCGuard consistently reduces reconstruction-based leakage and attack success rates while maintaining competitive task performance compared to standard KV-sharing baselines.