Korea UMar 12, 2026arXiv:2603.11589

Toward Complex-Valued Neural Networks for Waveform Generation

Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee

AI Summary

This paper introduces ComVo, a complex-valued neural vocoder for waveform generation that directly processes complex spectrograms, unlike existing real-valued networks. ComVo incorporates phase quantization to regularize phase transformations and a block-matrix computation scheme to improve training efficiency. Experiments show that ComVo achieves higher synthesis quality compared to real-valued baselines and reduces training time by 25%.

Key Contribution

Complex-valued neural networks can significantly improve the quality and efficiency of neural vocoders for waveform generation, outperforming real-valued counterparts and reducing training time by 25%.

Abstract

Neural vocoders have recently advanced waveform generation, yielding natural and expressive audio. Among these approaches, iSTFT-based vocoders have recently gained attention. They predict a complex-valued spectrogram and then synthesize the waveform via iSTFT, thereby avoiding learned upsampling stages that can increase computational cost. However, current approaches use real-valued networks that process the real and imaginary parts independently. This separation limits their ability to capture the inherent structure of complex spectrograms. We present ComVo, a Complex-valued neural Vocoder whose generator and discriminator use native complex arithmetic. This enables an adversarial training framework that provides structured feedback in complex-valued representations. To guide phase transformations in a structured manner, we introduce phase quantization, which discretizes phase values and regularizes the training process. Finally, we propose a block-matrix computation scheme to improve training efficiency by reducing redundant operations. Experiments demonstrate that ComVo achieves higher synthesis quality than comparable real-valued baselines, and that its block-matrix scheme reduces training time by 25%. Audio samples and code are available at https://hs-oh-prml.github.io/ComVo/.

Architecture Design (Transformers, SSMs, MoE)Speech & Audio Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References54

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Toward Complex-Valued Neural Networks for Waveform Generation

Related Papers