Shuai Wang

TN∑i=1nmax⁡(0,δ−|V(t)i−Vth|),\mathcal{C}(V(t)_{l})=\frac{1}{TN}\sum_{i=1}^{n}\max(0,\delta-\left|V(t)_{i}-V_{\text{th}}\right|), (12) 𝒞(V(t)l)\mathcal{C}(V(t)_{l}) computes the average quadratic penalty when membrane potentials V(t)iV(t)_{i} of neurons in layer ll approach the firing threshold VthV_{\text{th}}. NN and TT represent the number of time steps and the total number of layers in the SNNs, respectively. The hyperparameter δ\delta establishes a margin around VthV{\text{th}} where proximate potentials incur proportional penalties. Subsequently, we integrate this constraint with the target loss function, defining the overall loss within the framework of Lagrangian constraints (Kim & Jeong, 2021; Yoo & Jeong, 2023), which can be expressed as: ℒ(𝐱,λ)=ℒoss(𝐱)+λ∑l𝒞(V(t)l).\mathcal{L}(\mathbf{x},\lambda)=\mathcal{L}oss(\mathbf{x})+\lambda\sum_{l}\mathcal{C}(V(t)_{l}). (13) Here, ℒoss(𝐱)\mathcal{L}oss(\mathbf{x}) is the original loss function, 𝒞(V(t)l)\mathcal{C}(V(t)_{l}) represents the penalty term for the membrane potentials across all layers, and λ\lambda is a dynamically adjusted parameter that controls the significance of the constraint. We reveal that using a fixed magnitude for λ\lambda hinders network convergence and constraint satisfaction. Specifically, a larger λ\lambda leads to significant performance degradation and poor convergence during the initial training phase, while a smaller λ\lambda fails to enforce the constraint effectively. Therefore, to achieve an optimal balance between gradients sparsity and performance, we propose dynamic λ\lambda, which can be described as: λ=0.

Papers on Lattice

Total citations

Topics

h-index

Research focus

Architecture Design (Transformers, SSMs, MoE) (2)Speech & Audio (2)Reasoning & Chain-of-Thought (1)Tool Use & Agents (1)

Frequent co-authors

Jiayi Chen (1)Shuai Wang (1)Guangxu Zhu (1)Chengzhong Xu (1)

Papers (3)

Apr 2, 2026

Jiayi Chen +4Apr 2, 2026

Bridging Large-Model Reasoning and Real-Time Control via Agentic Fast-Slow Planning

By decoupling high-level reasoning from low-level control, Agentic Fast-Slow Planning enables more robust autonomous navigation, improving lateral deviation by up to 45% and completion time by over 12% compared to traditional MPC methods.

Jiayi Chen, Shuai Wang, Shuai Wang +2

Reasoning & Chain-of-Thought Tool Use & Agents World Models & Planning

Mar 11, 2026

Jing Peng +9Mar 11, 2026·also SJTU

G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition

G-STAR tackles long-form, multi-speaker ASR by giving Speech-LLMs time-aware speaker tracking, enabling robust identity linking across chunks.

Jing Peng, Ziyi Chen, Haoyu Li +7

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Mar 11, 2026·also Shenzhen Loop Area Institute

AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow

Ditch slow, multi-step sampling for target speaker extraction: AlphaFlowTSE achieves faster, one-step generation with improved speaker similarity and real-world generalization.

Duojia Li, Shuhan Zhang, Zihan Qian +4

Architecture Design (Transformers, SSMs, MoE)Speech & Audio

Search

Shuai Wang

Research focus

Frequent co-authors

Papers (3)