Search papers, labs, and topics across Lattice.
This paper introduces Spike-NVPT, a novel visual prompt tuning method that enhances the robustness of pre-trained vision models against input noise by employing a spiking neuron-based Signal Filtering Layer. The approach utilizes an integrate-and-fire mechanism to filter out transient noise and a Spike Discretization Unit to convert the filtered signals into sparse binary prompts, which serve as strong regularizers. Experimental results show that Spike-NVPT outperforms conventional methods by up to 11.2% in robustness while maintaining competitive accuracy on clean datasets, marking a significant advancement in the application of spiking neurons in visual model fine-tuning.
Noise-robust visual prompts can improve model performance by over 11% without increasing inference costs.
Pre-trained vision models have found widespread application across diverse domains. Prompt tuning-based methods have emerged as a parameter-efficient paradigm for adapting pre-trained vision models. While effective on standard benchmarks, the continuous and dense nature of learned prompts can lead to sensitivity against input noise, as the high-capacity prompts tend to overfit task-irrelevant details. To address this trade-off, we propose Spike-NVPT, a noise-robust visual prompt tuning method. Specifically, we design a Signal Filtering Layer based on spiking neurons, which uses the integrate-and-fire (IF) mechanism to accumulate task-relevant signals over time and filter transient noise fluctuations. A subsequent Spike Discretization Unit converts filtered signals into sparse binary prompts. This discretization acts as a strong regularizer, forcing the model to anchor decision boundaries on the most discriminative and robust features. Notably, the resulting binary prompts remain static during deployment, ensuring zero additional computational overhead during inference. Experimental results demonstrate that Spike-NVPT achieves superior robustness performance, with a maximum improvement of 11.2% over conventional methods, and retains competitive accuracy on clean datasets. To the best of our knowledge, this is the first attempt to leverage spiking neurons for fine-tuning traditional artificial neural network (ANN)-based visual models.