HITHumanoid Robot (Shanghai) Co.Apr 16, 2026arXiv:2604.14654

ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning

Junyi Wang, Chi Zhang, Chi Zhang, Jing Qian, Haifeng Luo, Haifeng Luo, Hao Wang, Zengrui Jin, Zengrui Jin, Chao Zhang

AI Summary

ClariCodec, a neural speech codec operating at 200 bps, is introduced to address bandwidth-constrained communication scenarios where intelligibility is paramount. The codec reformulates quantization as a stochastic policy, enabling reinforcement learning (RL)-based optimization of intelligibility by fine-tuning the encoder using WER-driven rewards while keeping the acoustic reconstruction pipeline frozen. Results show that ClariCodec achieves state-of-the-art WER at 200 bps, with further RL fine-tuning reducing WER to 3.20% on test-clean and 8.93% on test-other.

Key Contribution

Achieve near-perfect speech recognition at a ridiculously low 200 bits per second by using reinforcement learning to directly optimize a neural codec for intelligibility.

Abstract

In bandwidth-constrained communication such as satellite and underwater channels, speech must often be transmitted at ultra-low bitrates where intelligibility is the primary objective. At such extreme compression levels, codecs trained with acoustic reconstruction losses tend to allocate bits to perceptual detail, leading to substantial degradation in word error rate (WER). This paper proposes ClariCodec, a neural speech codec operating at 200 bit per second (bps) that reformulates quantisation as a stochastic policy, enabling reinforcement learning (RL)-based optimisation of intelligibility. Specifically, the encoder is fine-tuned using WER-driven rewards while the acoustic reconstruction pipeline remains frozen. Even without RL, ClariCodec achieves 3.68% WER on the LibriSpeech test-clean set at 200 bps, already competitive with codecs operating at higher bitrates. Further RL fine-tuning reduces WER to 3.20% on test-clean and 8.93% on test-other, corresponding to a 13% relative reduction while preserving perceptual quality.

Inference & Quantization Speech & Audio Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References64

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning

Related Papers