Search papers, labs, and topics across Lattice.
This paper introduces Low-Rank Sparse Prompting (LoRSP), a novel framework that leverages spiking neural networks (SNNs) to create dynamic, low-rank sparse visual prompts for adapting large-scale vision models. By utilizing the brain-inspired sparse firing mechanism of spiking neurons, LoRSP generates instance-specific prompts that enhance generalization and reduce energy inefficiency compared to traditional dense pixel-level prompting methods. Experimental results across various vision backbones show that LoRSP not only maintains competitive performance but also requires significantly fewer tunable parameters, highlighting its efficiency and effectiveness in visual prompting tasks.
Sparse visual prompts generated by LoRSP achieve robust adaptation with significantly fewer parameters, challenging the efficiency of traditional dense prompting methods.
Visual Prompting (VP) has emerged as an efficient paradigm for adapting large-scale pre-trained vision models to downstream tasks by incorporating learnable prompts at the input level. However, existing VP methods typically employ dense pixel-level prompts, which often suffer from redundant perturbations, limited generalization and energy inefficiency. To overcome these limitations, we propose to integrate brain-inspired spiking learning into visual prompt learning tasks. As we know that spiking neuron can perform inexpensive information processing by transmitting the input data into discrete spike trains and return sparse outputs. Inspired by this, we propose \textbf{Lo}w-\textbf{R}ank visual \textbf{S}pike \textbf{P}rompting (LoRSP), a novel framework that learns dynamic low-rank sparse visual prompts naturally via a Spiking neuron learning mechanism. The core idea of LoRSP is to exploit the brain-inspired sparse firing mechanism of spiking neurons to generate pixel-level sparse prompt for each instance. To be specific, we first construct a series of prompt factors via low-rank factorization to capture distinct prompt subspaces. These prompt factors are then fed into an SNN architecture, which performs the integrate-and-fire process to emit spikes. As a result, our LoRSP generates a \emph{sparse} visual prompt while maintaining the low-rank constraint. This design enables instance-specific selective prompting, leading to more compact and robust adaptation across diverse downstream tasks. Extensive experiments on five heterogeneous vision backbones and multiple benchmarks demonstrate that LoRSP achieves competitive performance while requiring fewer tunable parameters compared to existing VP methods.