Search papers, labs, and topics across Lattice.
This paper introduces a CNN-based approach for multilabel environmental sound classification using spectrograms, addressing the limitations of MFCC-based methods in capturing complex, overlapping sounds common in South Asian soundscapes. The method was evaluated on the SAS-KIIT dataset and UrbanSound8K, demonstrating significantly higher classification accuracy compared to MFCC-based techniques. The core of the approach involves training a CNN directly on spectrogram representations of the audio signals for multilabel classification.
Spectrograms beat MFCCs for South Asian sound classification, unlocking more accurate analysis of complex, overlapping urban soundscapes.
Environmental sound classification is a field of growing importance for urban monitoring and cultural soundscape analysis, especially within the acoustically rich environments of South Asia. These regions present a unique challenge as multiple natural, human, and cultural sounds often overlap, straining traditional methods that frequently rely on Mel Frequency Cepstral Coefficients (MFCC). This study introduces a novel spectrogram-based methodology with a superior ability to capture these complex auditory patterns. A Convolutional Neural Network (CNN) architecture is implemented to solve a demanding multilabel, multiclass classification problem on the SAS-KIIT dataset. To demonstrate robustness and comparability, the approach is also validated using the renowned UrbanSound8K dataset. The results confirm that the proposed spectrogram-based method significantly outperforms existing MFCC-based techniques, achieving higher classification accuracy across both datasets. This improvement lays the groundwork for more robust and accurate audio classification systems in real-world applications.