Haizhou Li

Shenzhen Loop Area Institute, The Chinese University of Hong Kong, Shenzhen Corresponding author: maluzhang@uestc.edu.cn Abstract Spiking Neural Networks (SNNs) represent a promising paradigm for energy-efficient neuromorphic computing due to their bio-plausible and spike-driven characteristics. However, the robustness of SNNs in complex adversarial environments remains significantly constrained. In this study, we theoretically demonstrate that those threshold-neighboring spiking neurons are the key factors limiting the robustness of directly trained SNNs. We find that these neurons set the upper limits for the maximum potential strength of adversarial attacks and are prone to state-flipping under minor disturbances. To address this challenge, we propose a Threshold Guarding Optimization (TGO) method, which comprises two key aspects. First, we incorporate additional constraints into the loss function to move neurons’ membrane potentials away from their thresholds. It increases SNNs’ gradient sparsity, thereby reducing the theoretical upper bound of adversarial attacks. Second, we introduce noisy spiking neurons to transition the neuronal firing mechanism from deterministic to probabilistic, decreasing their state-flipping probability due to minor disturbances. Extensive experiments conducted in standard adversarial scenarios prove that our method significantly enhances the robustness of directly trained SNNs. These findings pave the way for advancing more reliable and secure neuromorphic computing in real-world applications. 1 Introduction Spiking Neural Networks (SNNs) (Maass, 1997; Gerstner & Kistler, 2002; Izhikevich, 2003; Masquelier et al., 2008) mimics biological information transmission mechanisms using discrete spikes as the medium for information exchange, representing the cutting edge of neural computation (Cao et al., 2020; Varghese et al., 2016). Spiking neurons fire spikes only upon activation and remain silent otherwise. This event-driven mechanism (Liu & Yue, 2018) promotes sparse synapse operations and avoids multiply-accumulate (MAC) operations, significantly enhancing energy efficiency on neuromorphic platforms (Pei et al., 2019; DeBole et al., 2019; Ma et al., 2024; Pei et al., 2019). Recently, directly training SNNs with surrogate gradient methods (Wu et al., 2018; 2019; Deng et al., 2022; Li et al., 2021; Wang et al., 2025a) has significantly reduced their performance gap with ANNs in classification tasks (Yao et al., 2024a; Shi et al., 2024; Zhou et al., 2024; Wang et al., 2025c; Liang et al., 2025). However, these directly trained SNNs rely on Backpropagation Through Time (BPTT) (Werbos, 1990), thereby inheriting significant robustness issues associated with ANNs. Directly trained SNNs (Fang et al., 2021b; Zhou et al., 2023; Bu et al., 2022; Duan et al., 2022) using surrogate gradient methods often exhibit a strong dependency on specific patterns or features (Ding et al., 2022; Mukhoty et al., 2024), rendering them particularly sensitive to minor disturbances. This characteristic reduces robustness in complex environments, especially against finely crafted adversarial disturbances (Laskov & Lippmann, 2010). To enhance the robustness of SNNs against adversarial attacks, researchers adapt strategies from ANNs, such as adversarial training (Ho et al., 2022; Ding et al., 2022) and certified training (Zhang et al., 2019; Liang et al., 2022). Furthermore, researchers develop optimization methods tailored to spike-driven mechanisms, integrating with adversarial training to enhance robustness. Some researchers (Sharmin et al., 2020; Ding et al., 2023; El-Allami et al., 2021) utilize the temporal characteristics of SNNs to counteract environmental white noise attacks. Additionally, Evolutionary Leak Factor (Xu et al., 2024), MPD-SGR Jiang et al. (2025) and gradient sparsity regularization (SR) (Liu et al., 2024) significantly enhance the robustness of SNNs against gradient-based attacks. However, a comprehensive and unified analysis of the robustness bottlenecks in directly trained SNNs remains lacking. In this study, we theoretically demonstrate that threshold-neighboring spiking neurons are a key factor influencing the robustness of directly trained SNNs under adversarial attacks. We find that these neurons provide maximum potential pathways for adversarial attacks and are more prone to state-flipping under minor disturbances. To address this, we propose a Threshold Guarding Optimization (TGO) method. The TGO method aims to: (1) maximize the distance between neurons’ membrane potentials and their thresholds to enhance gradient sparsity; (2) minimize the probability of state-flipping in neurons under minor disturbances. A series of experiments in standard adversarial scenarios demonstrates that our TGO method significantly enhances the robustness of directly trained SNNs. The contributions of this work are summarized as follows: • We theoretically demonstrate that those threshold-neighboring spiking neurons are critical in limiting the robustness of directly trained SNNs under adversarial attacks. These neurons set the upper limits for the maximum potential strength of adversarial attacks and are prone to state-flipping under minor disturbances. • We propose a Threshold Guarding Optimization (TGO) method, aiming to minimize threshold-neighboring neurons’ sensitivity to adversarial attacks. First, we integrate additional constraints into the loss function, distancing the membrane potential from the threshold. Second, we introduce noisy spiking neurons to transit neuronal firing from deterministic to probabilistic, reducing the probability of state flips due to minor disturbances. • We validate the effectiveness of the TGO method across various adversarial attack scenarios using different training strategies. Extensive experiments demonstrate TGO method achieves state-of-the-art (SOTA) performance in multiple adversarial attacks, significantly enhancing the robustness of SNNs. Notably, TGO method incurs no additional computational overhead during inference, providing a feasible pathway toward robust edge intelligence. 2 Related Work Spiking Neural Networks: SNNs offer a promising solution for resource-constrained edge computing (Zhang et al., 2023). To enhance the performance of SNNs, Wu et al. (2018) introduces the spatial-temporal backpropagation (STBP) algorithm, an adaptation of BPTT from Recurrent Neural Networks (RNNs) (Graves & Graves, 2012; Lipton, 2015). This method uses surrogate functions to approximate the non-differentiable Heaviside step function in spiking neurons. Additionally, researchers explore parallel training strategies (Fang et al., 2024) within the ResNet framework (He et al., 2016), shortcut residual connections (Zheng et al., 2021; Hu et al., 2021; Lee et al., 2020; Fang et al., 2021a), and Spike transformer (Li et al., 2022). Notably, SNNs have achieved performance comparable to their ANN counterparts across diverse vision tasks, including image classification (Yao et al., 2024a; Xiao et al., 2025), object detection (Wang et al., 2025b), and semantic segmentation (Zhang et al., 2025). Despite surrogate gradient methods (Deng et al., 2023; Yang & Chen, 2023) significantly improve training efficiency, SNNs remain susceptible to adversarial attacks as ANNs (Finlayson et al., 2019; Xu et al., 2020), limiting their applicability in adversarial environments. Robustness of SNNs in Adversarial Attacks While biologically event-driven mechanisms (Marchisio et al., 2020; Hao et al., 2020) enhance SNNs’ adaptability in complex environments, empirical studies (Liang et al., 2021; El-Allami et al., 2021) reveal that directly trained SNNs remain vulnerable to adversarial attacks. Initial efforts to mitigate this vulnerability start with adapting Adversarial Training (AT) (Goodfellow et al., 2014; Kundu et al., 2021) and subsequently advance to Regularized Adversarial Training (RAT) (Ding et al., 2022) with Lipschitz analysis. However, these approaches are constrained by additional training overhead and limited portability (Shafahi et al., 2019). Recently, researchers have developed optimization methods tailored to spike-driven mechanisms of SNNs. Such as Hao et al. (2023) enhances intrinsic robustness through rate-temporal information integration. (Xu et al., 2024) introduces FEEL-SNN with random membrane potential decay and innovative encoding mechanisms, and Ding et al. (2024b) develops gradient SR to strengthen defenses against Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2014) and Projected Gradient Descent (PGD) (Madry, 2017). Despite these advances, these strategies achieve significant enhancements only through synergistic integration with AT and RAT strategies. Moreover, a theoretical analysis of SNNs’ inherent vulnerabilities in adversarial environments is still lacking. Thus, devising more effective robustness optimization strategies for SNNs remains a focused research. 3 Preliminaries 3.1 Surrogate Gradient for direct trained SNNs SNNs effectively model the complex dynamics of biological neurons. Within the Leaky Integrate-and-Fire (LIF) framework, the membrane potential transitions through three key stages: integration, leakage, and firing. During integration, the membrane potential V[t]V[t] accumulates over time in response to incoming spikes. When V[t]V[t] exceeds a predefined threshold VthV_{\mathrm{th}}, it triggers a spike that may influence downstream neurons. Following the spike, the membrane potential is reset to a specified baseline VresetV_{\mathrm{reset}}, preparing the neuron for subsequent inputs, which can be described as: V[t]=τU[t−1]+WS[t],V[t]=\tau U[t-1]+WS[t], (1) S[t]=Θ(V[t]−Vth),S[t]=\Theta\left(V[t]-V_{\mathrm{th}}\right), (2) U[t]=V[t](1−S[t])+VresetS[t],U[t]=V[t]\left(1-S[t]\right)+V_{\mathrm{reset}}S[t], (3) where τ\tau is the membrane time constant, WW represents the synaptic weights, S[t]S[t] denotes the spike at time tt, and Θ(⋅)\Theta(\cdot) is the Heaviside step function, indicating firing when V[t]V[t] exceeds VthV_{\mathrm{th}}. In the directly trained SNNs, the total loss LL with respect to the weights WW can be described as: ∂L∂W=∑t∂L∂S[t]∂S[t]∂V[t]∂V[t]∂W.\frac{\partial L}{\partial W}=\sum_{t}\frac{\partial L}{\partial S[t]}\frac{\partial S[t]}{\partial V[t]}\frac{\partial V[t]}{\partial W}. (4) where ∂S[t]∂V[t]\frac{\partial S[t]}{\partial V[t]} represents the gradient of a non-differentiable step function involving the derivative of the Dirac δ\delta-function, which is typically replaced by surrogate gradients with derivable curves. Various forms of surrogate gradients have been utilized, such as rectangular (Wu et al., 2018; 2019), triangular (Esser et al., 2016; Rathi & Roy, 2020), and exponential (Shrestha & Orchard, 2018) curves. Surrogate gradients provide a differentiable approximation to non-differentiable functions. 3.2 Adversarial Attacks Adversarial attacks, including the Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2014)and Projected Gradient Descent (PGD) (Madry, 2017), rely on the model’s gradient information to craft adversarial examples. FGSM generates such examples by applying a single-step perturbation designed to maximize the model’s prediction error. The adversarial input is calculated as: 𝐱adv=x+ϵ⋅sign(∇xℒ(x,ytrue)),\mathbf{x}_{\text{adv}}=x+\epsilon\cdot\text{sign}(\nabla_{x}\mathcal{L}(x,y_{\text{true}})), (5) where 𝐱adv\mathbf{x}_{\text{adv}} is the adversarial example, xx is the original input, ϵ\epsilon is the perturbation magnitude, ℒ(x,ytrue)\mathcal{L}(x,y_{\text{true}}) is the loss function, and sign(∇xℒ(x,ytrue))\text{sign}(\nabla_{x}\mathcal{L}(x,y_{\text{true}})) gives the sign of the gradient concerning the input. This process leverages the model’s loss landscape to introduce minimal disturbances that significantly increase the classification error. Building on this, PGD iteratively refines adversarial examples by applying gradient updates and projecting them back into a bounded ϵ\epsilon-ball centered on the original input. The attack rule of PGD can be expressed as: 𝐱advt+1=Clipx,ϵ(𝐱advt+α⋅sign(∇xℒ(𝐱advt,ytrue))),\mathbf{x}_{\text{adv}}^{t+1}=\text{Clip}_{x,\epsilon}\left(\mathbf{x}_{\text{adv}}^{t}+\alpha\cdot\text{sign}(\nabla_{x}\mathcal{L}(\mathbf{x}_{\text{adv}}^{t},y_{\text{true}}))\right), (6) where 𝐱advt\mathbf{x}_{\text{adv}}^{t} is the adversarial example at iteration tt, α\alpha is the step size, and Clip𝐱,ϵ\text{Clip}_{\mathbf{x},\epsilon} ensures that the perturbation remains within the prescribed ϵ\epsilon-ball. PGD employs a multi-step approach to more precisely explore the disturbance space, producing adversarial examples closer to the optimal solution. At the same time, it strictly constrains the magnitude of disturbances, ensuring the perturbed input remains nearly indistinguishable from the original data to human observers. Multi-PGD improves robustness evaluation by reducing sensitivity to initialization and increasing the likelihood of finding stronger adversarial examples. APGD (Auto-PGD) automates and stabilizes the attack by adaptively adjusting hyperparameters such as step size and scheduling, enabling consistently strong performance with minimal manual tuning. These two methods serve as standard benchmarks for evaluating the adversarial defense capabilities of neural networks. 4 Methods 4.1 Robustness Analysis of Directly Trained SNNs Figure 1: Red traces represent membrane potential dynamics of spiking neurons under adversarial attack. Only membrane potentials near thresholds undergo spike pattern transitions, while others remain unchanged. To explore the key factors affecting the adversarial robustness of SNNs, we conduct a detailed analysis of the SNNs’ dynamic properties under adversarial attacks. Our findings highlight two critical vulnerabilities associated with those threshold-neighboring spiking neurons. First, they establish a theoretical upper bound for the maximum potential strength of adversarial attacks. Second, they exhibit a higher probability of state-flipping under minor disturbances. maximum potential gradient-based attack path: Adversarial attacks strategically modify input disturbances to maximize the expected loss, with these disturbances typically aligning with the gradient of the input data. The metric ℛadv(f,x,ϵ)\mathcal{R}_{\text{adv}}(f,x,\epsilon) quantifies the maximum potential strength of an adversarial attack on the neural network ff at a specific input xx, where the disturbances are constrained within a unit ℓp\ell_{p}-norm ball and scaled by the factor ϵ\epsilon. This measure is mathematically expressed as follows: ℛadv(f,x,ϵ)=max‖δ‖p≤1⁡‖f(x+ϵδ)−f(x)‖22.\mathcal{R}_{\text{adv}}(f,x,\epsilon)=\max_{\|\delta\|_{p}\leq 1}\|f(x+\epsilon\delta)-f(x)\|_{2}^{2}. (7) Eq.7 aims to find the disturbance δ\delta that maximizes the squared output difference while remaining within an ℓp\ell_{p}-norm ball. Applying Taylor expansion with the Lagrange remainder, we expand it as follows: f(x+ϵδ)=f(x)+Jf(x)(ϵδ)+(ϵδ)22Hf(x+ξϵδ).f(x+\epsilon\delta)=f(x)+J_{f}(x)(\epsilon\delta)+\frac{\left(\epsilon\delta\right)^{2}}{2}H_{f}(x+\xi\epsilon\delta). (8) where ξ∈(0,1)\xi\in(0,1) serves as the expansion coefficient. Let f:ℝn→ℝmf:\mathbb{R}^{n}\to\mathbb{R}^{m} be a continuously differentiable neural network function at point xx, and let ϵ>0\epsilon>0 be sufficiently small. The matrix Hf(x)H_{f}(x) and Jf(x)J_{f}(x) respectively denote the Hessian and Jacobian matrix of the function f(⋅)f(\cdot) at point xx. Utilizing the Cauchy-Schwarz inequality Steele (2004) and assuming that ‖δ‖p≤1\|\delta\|_{p}\leq 1, the upper bound of the difference caused by the minor disturbance can be expressed as: ‖f(x+ϵδ)−f(x)‖22≤(ϵ‖Jf(x)‖2+ϵ22λHmax)2,\|f(x+\epsilon\delta)-f(x)\|_{2}^{2}\leq\left(\epsilon\|J_{f}(x)\|_{2}+\frac{\epsilon^{2}}{2}\lambda_{\text{Hmax}}\right)^{2}, (9) Eq.9 represents the sensitivity of ff at xx to disturbances along δ\delta. ‖Jf(x)‖\|J_{f}(x)\| is the ℓ2\ell_{2} norm of the Jacobian matrix, and λHmax\lambda_{\text{Hmax}} is the maximum eigenvalue of Hf(x)H_{f}(x). Then, we derive the upper bound on the ℛadv\mathcal{R}_{\text{adv}}: ℛadv(f,x,ϵ)≤ϵ2‖Jf(x)‖22+O(ϵ2).\mathcal{R}_{\text{adv}}(f,x,\epsilon)\leq\epsilon^{2}\|J_{f}(x)\|_{2}^{2}+O(\epsilon^{2}). (10) The Jacobian matrix Jf(x)J_{f}(x) can be expressed as the collection of gradients of each component of the function: ‖Jf(x)‖22=λJmax(∑i=1m∇fi(x)∇fi(x)T).\|J_{f}(x)\|_{2}^{2}=\lambda_{\text{Jmax}}\left(\sum_{i=1}^{m}\nabla f_{i}(x)\nabla f_{i}(x)^{T}\right). (11) According to Eq.11, the sensitivity of SNNs to adversarial disturbances is correlated with the ℓ2\ell_{2} norm of their Jacobian matrix, where higher gradient ℓ2\ell_{2} norms indicate greater susceptibility to adversarial attacks. Notably, directly trained SNNs typically rely on surrogate gradients, which exhibit peak values near the threshold. As the number of threshold-neighboring spiking neurons increases, the ℓ2\ell_{2} norm of the gradients in SNNs also rises, thereby enlarging ℛadv(f,x,ϵ)\mathcal{R}_{\text{adv}}(f,x,\epsilon). Consequently, these neurons significantly raise the theoretical upper limit of adversarial perturbation strength. Details can be found in Appendix B. Strong State-flipping Probability: Adversarial attacks introduce carefully crafted small disturbances into the input data, achieving their disruptive effects. These disturbances propagate through the multi-layers, causing state-flipping in spiking neurons and ultimately altering the final output. Due to the spike-driven nature, changes occur only when spiking neurons’ membrane potential crosses the threshold. Theorem 1 Let V[t]V[t] be the membrane potential, VthV_{\mathrm{th}} the threshold, and η[t]∼𝒩(0,σ2)\eta[t]\sim\mathcal{N}(0,\sigma^{2}) random perturbation. The probability PflipP_{\mathrm{flip}} of each neuron’s flipping is given by: Pflip={Φ(Vth−V[t]σ),if V[t]≥Vth,1−Φ(Vth−V[t]σ),if V[t]<Vth.P_{\mathrm{flip}}=\begin{cases}\Phi\left(\frac{V_{\mathrm{th}}-V[t]}{\sigma}\right),&\text{if }V[t]\geq V_{\mathrm{th}},\\ 1-\Phi\left(\frac{V_{\mathrm{th}}-V[t]}{\sigma}\right),&\text{if }V[t]<V_{\mathrm{th}}.\end{cases} where Φ\Phi denotes the cumulative distribution function (CDF) of the standard normal distribution. Theorem 1 defines the relationship between neuronal membrane potential and their state-flipping probability. Specifically, when V[t]≥VthV[t]\geq V_{\mathrm{th}}, the neuron output switches from 1 to 0, and when V[t]<VthV[t]<V_{\mathrm{th}}, it flips from 0 to 1. Since the CDF of the standard normal distribution Φ(⋅)\Phi(\cdot) is increasing monotonically, PflipP_{\mathrm{flip}} increases as the membrane potential V[t]V[t] approaches the threshold potential VthV_{\mathrm{th}}, whether V[t]V[t] is above or below VthV_{\mathrm{th}}.As shown in Fig. 1, small noise perturbations mainly cause state flips in neurons near the threshold, while fluctuations elsewhere have little effect on spike outputs. Therefore, state flipping directly increases the instability of activation patterns, and its impact on the upper bound of adversarial attacks is summarized as follows. Theorem 2 For a discrete spike pattern mapping f:ℝn→ℝmf:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}, small perturbations εδ\varepsilon\delta around input xx induce a finite set of activation pattern transitions. The adversarial robustness upper bound can be approximated as: Radv(f,x,ε)≤ε2max1≤k≤K⁡‖A𝒜k‖p→22,R_{\text{adv}}(f,x,\varepsilon)\leq\varepsilon^{2}\max_{1\leq k\leq K}\|A_{\mathcal{A}_{k}}\|_{p\to 2}^{2}, where KK denotes the number of activation regions intersecting the perturbation ball Bε(x)B_{\varepsilon}(x), and A𝒜k∈ℝm×nA_{\mathcal{A}_{k}}\in\mathbb{R}^{m\times n} is the affine transformation matrix for activation pattern 𝒜k={(l,i):ui(x)≥θi}\mathcal{A}_{k}=\{(l,i):u_{i}(x)\geq\theta_{i}\}. Theorem 2 proves that a larger KK expands the set of possible transformations (represented by affine matrices A𝒜kA_{\mathcal{A}_{k}}), thereby increasing the potential impact of adversarial perturbations on the system. As the proportion of threshold-near neurons increases, KK grows, leading to heightened adversarial vulnerability. In summary, threshold-neighboring spiking neurons play a crucial role in the adversarial robustness of SNNs. To address this, we propose an optimization strategy designed to mitigate their impact, thereby strengthening the overall resilience of SNNs in adversarial environments. 4.2 Threshold Guarding Optimization Method 4.2.1 Membrane Potential Constraints The surrogate gradients of threshold-neighboring spiking neuron, significantly influence the ‖Jf(x)‖22\|J_{f}(x)\|_{2}^{2} of SNNs. To mitigate this effect, we propose additional constraints at each spiking neuron layer to optimize the membrane potential distribution, ensuring it remains as distant as possible from the threshold. Figure 2: Mechanism and working principle of the TGO method. (a) The TGO method combines membrane potential constraints with noisy LIF neuron models for adversarial defense. (b) Gradient-based adversarial attacks illustrate how disturbances affect input images. (c) The joint optimization of the objective and constraint functions drives neuron membrane potentials away from the firing threshold. (d) The noisy LIF model effectively reduces the probability of state flips caused by small input disturbances, enhancing model stability. The membrane potential constraint function can be described as: 𝒞(V(t)l)=

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Architecture Design (Transformers, SSMs, MoE) (1)Speech & Audio (1)

Frequent co-authors

Duojia Li (1)Shuhan Zhang (1)Zihan Qian (1)Wenxuan Wu (1)

Papers (1)

Mar 11, 2026

Mar 11, 2026·also Shenzhen Loop Area Institute

AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow

Ditch slow, multi-step sampling for target speaker extraction: AlphaFlowTSE achieves faster, one-step generation with improved speaker similarity and real-world generalization.

Duojia Li, Shuhan Zhang, Zihan Qian +5

Architecture Design (Transformers, SSMs, MoE)Speech & Audio

Search

Haizhou Li

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (1)