Shanghai Qizhi InstituteSJTUState Key Laboratory of CryptologyMay 6, 2026arXiv:2605.04901

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference

Zhengyi Li, Yakai Wang, Kang Yang, Yu Yu, Jiaping Gui, Yu Feng, Ning Liu, Minyi Guo, Jingwen Leng

AI Summary

This paper analyzes the security of a shuffling defense used in secure Transformer inference, which aims to protect model weights by revealing only randomly permuted activations to the client. The authors demonstrate a novel attack that aligns differently shuffled activations to a common permutation, enabling the extraction of model weights despite the shuffling defense. Experiments on Pythia-70m and GPT-2 show successful weight recovery with minimal query costs, highlighting the vulnerability of the shuffling defense.

Key Contribution

Shuffling activations, a popular defense in secure Transformer inference, crumbles under a new alignment attack that recovers model weights for just $1.

Abstract

For Transformer models, cryptographically secure inference ensures that the client learns only the final output, while the server learns nothing about the client's input. However, securely computing nonlinear layers remains a major efficiency bottleneck due to the substantial communication rounds and data transmission required. To address this issue, prior works reveal intermediate activations to the client, allowing nonlinear operations to be computed in plaintext. Although this approach significantly improves efficiency, exposing activations enables adversaries to extract model weights. To mitigate this risk, existing works employ a shuffling defense that reveals only randomly permuted activations to the client. In this work, we show that the shuffling defense is not as robust as previously claimed. We propose an attack that aligns differently shuffled activations to a common permutation and subsequently exploits them to extract model weights. Experiments on Pythia-70m and GPT-2 demonstrate that the proposed attack can align shuffled activations with mean squared errors ranging from $10^{-9}$ to $10^{-6}$. With a query cost of approximately \$1, the adversary can recover model weights with L1-norm differences ranging from $10^{-4}$ to $10^{-2}$ compared to the oracle weights.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference

Related Papers