9]. ***Code and trained models will be made publicly available upon acceptance.Mar 17, 2026arXiv:2603.16338

SpikeCLR: Contrastive Self-Supervised Learning for Few-Shot Event-Based Vision using Spiking Neural Networks

Maxime Vaillant, Axel Carlier, Lai Xing Ng, Christophe Hurter, Benoit R. Cottereau

AI Summary

SpikeCLR, a contrastive self-supervised learning framework, is introduced to train Spiking Neural Networks (SNNs) on unlabeled event data by adapting frame-based contrastive learning methods to the spiking domain using surrogate gradient training. The framework incorporates event-specific augmentations, including spatial, temporal, and polarity transformations, to learn robust visual representations. Experiments on several event-based datasets demonstrate that SpikeCLR pretraining followed by fine-tuning outperforms supervised learning in few-shot and semi-supervised scenarios, with spatial and temporal augmentations being crucial for learning spatio-temporal invariances.

Key Contribution

SNNs can now learn robust visual representations from unlabeled event data, rivaling supervised learning in low-data regimes, thanks to a new contrastive self-supervised learning framework.

Abstract

Event-based vision sensors provide significant advantages for high-speed perception, including microsecond temporal resolution, high dynamic range, and low power consumption. When combined with Spiking Neural Networks (SNNs), they can be deployed on neuromorphic hardware, enabling energy-efficient applications on embedded systems. However, this potential is severely limited by the scarcity of large-scale labeled datasets required to effectively train such models. In this work, we introduce SpikeCLR, a contrastive self-supervised learning framework that enables SNNs to learn robust visual representations from unlabeled event data. We adapt prior frame-based methods to the spiking domain using surrogate gradient training and introduce a suite of event-specific augmentations that leverage spatial, temporal, and polarity transformations. Through extensive experiments on CIFAR10-DVS, N-Caltech101, N-MNIST, and DVS-Gesture benchmarks, we demonstrate that self-supervised pretraining with subsequent fine-tuning outperforms supervised learning in low-data regimes, achieving consistent gains in few-shot and semi-supervised settings. Our ablation studies reveal that combining spatial and temporal augmentations is critical for learning effective spatio-temporal invariances in event data. We further show that learned representations transfer across datasets, contributing to efforts for powerful event-based models in label-scarce settings.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SpikeCLR: Contrastive Self-Supervised Learning for Few-Shot Event-Based Vision using Spiking Neural Networks

Related Papers