Search papers, labs, and topics across Lattice.
This paper investigates the potential for latent states in latent-based multi-agent systems to carry attack-related information that can degrade performance during clean executions. By introducing a latent attack framework, the authors demonstrate that these latent-only attacks can significantly impair task performance, particularly during inter-agent KV-cache handoffs. The findings reveal that while latent representations enhance collaboration efficiency, they also obscure attack risks that require new safeguards beyond traditional visible-text inspections.
Latent-based collaboration may obscure attack risks, with latent-only attacks capable of severely degrading performance without any visible adversarial text.
Latent-based multi-agent systems replace parts of explicit inter-agent communication with hidden representations, offering a new direction for efficient and flexible agent collaboration. However, moving coordination into latent space may also move attacks beyond the reach of visible-text inspection. In this paper, we study whether latent states can carry attack-associated information that remains effective during clean executions. To examine this question, we introduce a latent attack framework that reactivates attack-induced effects through latent interventions without reusing adversarial text. Extensive experiments show that the resulting latent-only attacks can substantially degrade task performance in clean executions, especially when applied to inter-agent KV-cache handoffs rather than local hidden states. Further control analyses indicate that this degradation cannot be reduced to arbitrary perturbations or invalid generation. Overall, our findings suggest that latent-based collaboration does not remove attack risk. It shifts part of the risk into less observable execution states, calling for safeguards beyond visible-text inspection.