Search papers, labs, and topics across Lattice.
This paper introduces a multi-band encoding framework to leverage the full spectrum of bioacoustic recordings, addressing the limitations of standard 16kHz pre-trained audio models. The approach decomposes the full spectrum into band-specific features, fuses them into a unified representation, and analyzes the resulting embeddings. Experiments on three bioacoustic datasets demonstrate that fused representations outperform baseband and time-expansion baselines, particularly when using encoders that produce decorrelated band embeddings.
Unlocking the full spectrum of animal sounds, previously discarded by standard audio models, can significantly improve bioacoustic classification.
Animals hear and vocalize across frequency ranges that differ substantially from humans, often extending into the ultrasonic domain. Yet most computational bioacoustics systems rely on audio models pre-trained at 16 kHz, restricting their usable bandwidth to the 0-8 kHz baseband and discarding higher-frequency information present in many bioacoustic recordings. We investigate a multi-band encoding framework that decomposes the full spectrum of animal calls into band features and fuses them into a unified representation. Similarity analyses on models show that certain encoders produce decorrelated band embeddings that improve class separation after fusion. Classification experiments on three bioacoustic datasets using eight pre-trained models and five fusion strategies show that fused representations consistently outperform the baseband and time-expansion baselines on two datasets, showing the potential of multi-band methods for full-spectrum encoding of animal calls.