Search papers, labs, and topics across Lattice.
This paper benchmarks language model-based lossless audio compression across diverse audio types, sampling rates, and bit depths, addressing the gap in prior work limited to 8-bit audio. To handle the vocabulary size explosion at higher bit depths (16/24-bit), they introduce Trilobyte, a byte-level tokenization scheme that scales vocabulary size to O(1). Experiments show that LMs outperform FLAC at 8-bit and 16-bit, but compression gains diminish at 24-bit, suggesting limitations of current LM approaches for very high-fidelity audio compression.
Language models can beat FLAC for lossless audio compression at 8-bit and 16-bit, but their advantage shrinks at 24-bit, revealing a challenge for high-fidelity audio.
Autoregressive"language"models (LMs) trained on raw waveforms can be repurposed for lossless audio compression, but prior work is limited to 8-bit audio, leaving open whether such approaches work for practical settings (16/24-bit) and can compete with existing codecs. We benchmark LM-based compression on full-fidelity audio across diverse domains (music, speech, bioacoustics), sampling rates (16kHz-48kHz), and bit depths (8, 16, 24-bit). Standard sample-level tokenization becomes intractable at higher bit depths due to vocabulary size (65K for 16-bit; 16.7M for 24-bit). We propose Trilobyte, a byte-level tokenization schema for full resolution audio, improving vocabulary scaling from $O(2^{b})$ to $O(1)$ and enabling the first tractable 24-bit LM-based lossless compression. While LMs consistently outperform FLAC and yield state-of-the-art compression at 8-bit and 16-bit, we observe that compression gains become more modest as bit depth increases beyond 8-bit.