Jun 9, 2026arXiv:2606.10887

Listen, Look, and Learn: Learning Without Forgetting through SAM-Audio

Avi Gupta, Nilotpal Sinha, Vishnu Raj, Sambuddha Saha, Pratik Joshi, Koteswar Rao Jerripothula, Tammam Tillo

AI Summary

This paper addresses the challenge of Class-Incremental Learning (CIL) in the audio-visual domain by integrating the SAM-Audio model's rich static priors into a continuous learning framework. The authors introduce a guided attention strategy that allows audio features to inform visual representations, alongside dual-level distillation objectives to combat catastrophic forgetting. Their extensive evaluations show that this approach significantly outperforms existing state-of-the-art methods in audio-visual CIL benchmarks, highlighting its effectiveness in maintaining learned knowledge while adapting to new classes.

Key Contribution

Integrating audio features into visual learning not only enhances performance but also mitigates catastrophic forgetting in Class-Incremental Learning.

Abstract

Class-Incremental Learning (CIL) aims to continuously learn new classes without forgetting previously acquired knowledge. While recent CIL advances have spurred significant interest across various modalities, the audio-visual setting remains underexplored. Furthermore, although foundational multimodal models like SAM-Audio encapsulate rich static priors, our empirical analysis reveals that these representations struggle in incremental settings. This work bridges this gap by integrating SAM-Audio's audio-visual priors into the CIL setting. Specifically, we leverage its dense audio and visual representations and employ a novel guided attention strategy where the audio features contextually guide the visual representations. To further mitigate catastrophic forgetting, we introduce dual-level distillation objectives at both the feature and logit levels. Extensive evaluations on audio-visual CIL benchmarks demonstrate that our approach consistently outperforms state-of-the-art methods.

Multimodal Models Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Listen, Look, and Learn: Learning Without Forgetting through SAM-Audio

Related Papers