School of Computer EngineeringTU MunichMar 9, 2026arXiv:2603.08154

Soundscapes in Spectrograms: Pioneering Multilabel Classification for South Asian Sounds

Sudip Chakrabarty, Pappu Bishwas, Rajdeep Chatterjee, Tathagata Bandyopadhyay, Digonto Biswas, Bibek Howlader

AI Summary

This paper introduces a CNN-based approach for multilabel environmental sound classification using spectrograms, addressing the limitations of MFCC-based methods in capturing complex, overlapping sounds common in South Asian soundscapes. The method was evaluated on the SAS-KIIT dataset and UrbanSound8K, demonstrating significantly higher classification accuracy compared to MFCC-based techniques. The core of the approach involves training a CNN directly on spectrogram representations of the audio signals for multilabel classification.

Key Contribution

Spectrograms beat MFCCs for South Asian sound classification, unlocking more accurate analysis of complex, overlapping urban soundscapes.

Abstract

Environmental sound classification is a field of growing importance for urban monitoring and cultural soundscape analysis, especially within the acoustically rich environments of South Asia. These regions present a unique challenge as multiple natural, human, and cultural sounds often overlap, straining traditional methods that frequently rely on Mel Frequency Cepstral Coefficients (MFCC). This study introduces a novel spectrogram-based methodology with a superior ability to capture these complex auditory patterns. A Convolutional Neural Network (CNN) architecture is implemented to solve a demanding multilabel, multiclass classification problem on the SAS-KIIT dataset. To demonstrate robustness and comparability, the approach is also validated using the renowned UrbanSound8K dataset. The results confirm that the proposed spectrogram-based method significantly outperforms existing MFCC-based techniques, achieving higher classification accuracy across both datasets. This improvement lays the groundwork for more robust and accurate audio classification systems in real-world applications.

Computer Vision Multimodal Models Speech & Audio

Citation Metrics

Citations0

Influential citations0

References16

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Soundscapes in Spectrograms: Pioneering Multilabel Classification for South Asian Sounds

Related Papers