Mar 2, 2026arXiv:2603.02022

CodecFlow: Efficient Bandwidth Extension via Conditional Flow Matching in Neural Codec Latent Space

Bowen Zhang, Bowen Zhang, Junchuan Zhao, Ian Mcloughlin, Ian McLoughlin, A. Madhukumar, A S Madhukumar

AI Summary

The paper introduces CodecFlow, a neural codec-based bandwidth extension (BWE) framework that operates in the latent space of neural audio codecs to efficiently reconstruct high-frequency speech content. CodecFlow addresses the challenge of representation mismatch in latent space BWE by using a voicing-aware conditional flow converter and a structure-constrained residual vector quantizer to improve latent alignment stability. Experiments demonstrate that CodecFlow achieves strong spectral fidelity and enhanced perceptual quality in both 8 kHz to 16 kHz and 44.1 kHz speech BWE tasks.

Key Contribution

Achieve high-fidelity bandwidth extension by operating directly in the latent space of neural audio codecs, sidestepping the computational costs and fidelity limitations of spectrogram or waveform-based methods.

Abstract

Speech Bandwidth Extension improves clarity and intelligibility by restoring/inferring appropriate high-frequency content for low-bandwidth speech. Existing methods often rely on spectrogram or waveform modeling, which can incur higher computational cost and have limited high-frequency fidelity. Neural audio codecs offer compact latent representations that better preserve acoustic detail, yet accurately recovering high-resolution latent information remains challenging due to representation mismatch. We present CodecFlow, a neural codec-based BWE framework that performs efficient speech reconstruction in a compact latent space. CodecFlow employs a voicing-aware conditional flow converter on continuous codec embeddings and a structure-constrained residual vector quantizer to improve latent alignment stability. Optimized end-to-end, CodecFlow achieves strong spectral fidelity and enhanced perceptual quality on 8 kHz to 16 kHz and 44.1 kHz speech BWE tasks.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Speech & Audio

Citation Metrics

Citations0

Influential citations0

References39

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CodecFlow: Efficient Bandwidth Extension via Conditional Flow Matching in Neural Codec Latent Space

Related Papers