PKUSEUMay 27, 2026arXiv:2605.28214

Out of Sight, Not Out of Mind: Unveiling Latent Attack in Latent-based Multi-Agent Systems

Chenxi Wang, Ruiyang Huang, Jiayan Sun, Lei Wei, Yifan Wu

AI Summary

This paper investigates the potential for latent states in latent-based multi-agent systems to carry attack-related information that can degrade performance during clean executions. By introducing a latent attack framework, the authors demonstrate that these latent-only attacks can significantly impair task performance, particularly during inter-agent KV-cache handoffs. The findings reveal that while latent representations enhance collaboration efficiency, they also obscure attack risks that require new safeguards beyond traditional visible-text inspections.

Key Contribution

Latent-based collaboration may obscure attack risks, with latent-only attacks capable of severely degrading performance without any visible adversarial text.

Abstract

Latent-based multi-agent systems replace parts of explicit inter-agent communication with hidden representations, offering a new direction for efficient and flexible agent collaboration. However, moving coordination into latent space may also move attacks beyond the reach of visible-text inspection. In this paper, we study whether latent states can carry attack-associated information that remains effective during clean executions. To examine this question, we introduce a latent attack framework that reactivates attack-induced effects through latent interventions without reusing adversarial text. Extensive experiments show that the resulting latent-only attacks can substantially degrade task performance in clean executions, especially when applied to inter-agent KV-cache handoffs rather than local hidden states. Further control analyses indicate that this degradation cannot be reduced to arbitrary perturbations or invalid generation. Overall, our findings suggest that latent-based collaboration does not remove attack risk. It shifts part of the risk into less observable execution states, calling for safeguards beyond visible-text inspection.

Red-Teaming & Adversarial Robustness Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Out of Sight, Not Out of Mind: Unveiling Latent Attack in Latent-based Multi-Agent Systems

Related Papers