CASApr 6, 2025

Dual Decoder for Fast Inference in Natural Language Generation

Wenbo Wang, Huiying Wang, Zhaoyang Wang, Shuailou Li, Yu Wen

AI Summary

The paper introduces Dual-Decoder (Dude), a novel architecture for accelerating natural language generation inference. Dude employs a two-decoder structure: a semantic decoder to capture long-term dependencies and an output decoder for fast sequence prediction. Experiments across NMT, text summarization, and question generation demonstrate a 1.43x to 1.62x speedup compared to standard autoregressive baselines, while maintaining comparable performance.

Key Contribution

Autoregressive generation bottlenecks be gone: a dual-decoder architecture achieves up to 1.6x faster inference without sacrificing quality.

Abstract

Natural language generation is an important task in natural language processing and has been applied in various scenarios. Most state-of-the-art generation models, however, are usually slow at inference time mainly due to the sequential dependencies of autoregressive generation and the use of more and more large-scale decoder models. To this end, we propose a Dual-Decoder (Dude) model to speed up the decoder without sacrificing the overall model performance. Dude model is composed of a semantic decoder and an output decoder, which are able to capture the long-term semantic dependencies and predict the target sequence fast as well. We evaluate Dude model on three natural language generation tasks including Neural Machine Translation, Text Summarization and Question Generation. The experimental results demonstrate that our model achieves 1.43× faster inference speed than the standard baseline while maintaining comparable performance, and even 1.62× faster on longer sequence generation tasks.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References26

Year2025

VenueIEEE International Conference on Acoustics, Speech, and Signal Processing

Related Papers

Finding related papers...

Search

Dual Decoder for Fast Inference in Natural Language Generation

Related Papers