Search papers, labs, and topics across Lattice.
This paper introduces VoCodec, a low-bitrate streamable neural speech codec that optimizes bitrate allocation by implementing voicing-driven quantization, which assigns higher bitrates to voiced frames and lower to unvoiced ones. By embedding a voicing detector within a fully causal encoder-quantizer-decoder framework, VoCodec utilizes residual scalar-vector quantization for voiced frames and simple scalar quantization for unvoiced frames. Experimental results on the LibriTTS dataset demonstrate that VoCodec achieves superior performance compared to baseline codecs, effectively reducing bitrate by approximately 27% while maintaining quality, even at a low bitrate of 1.1 kbps.
VoCodec achieves a remarkable 27% bitrate reduction while enhancing speech quality by intelligently allocating resources based on voicing characteristics.
Neural speech codecs are key to speech transmission and storage, but most use uniform quantization across frames, allocating the same bitrate regardless of content and wasting bits. We propose VoCodec, a low-bitrate streamable neural speech codec with voicing-driven quantization that assigns higher bitrate to voiced frames and lower bitrate to unvoiced frames according to perceptual sensitivity. VoCodec embeds a voicing detector in a fully causal encoder-quantizer-decoder neural coding framework, using residual scalar-vector quantization for voiced frames and simple scalar quantization for unvoiced ones. Experiments show that on the LibriTTS dataset at a 16 kHz sampling rate, VoCodec outperforms baseline neural speech codecs even at a bitrate as low as 1.1 kbps. Our further experiments also confirm that introducing voicing-driven quantization can effectively reduce the bitrate by approximately 27% compared with uniform quantization strategy.