Tsinghua AIMar 18, 2026arXiv:2603.17837

The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

Donghang Wu, Tianyu Zhang, Yuxin Li, Hexin Liu, Chen Chen, Eng Siong Chng, E. Chng, Yoshua Bengio, Y. Bengio

AI Summary

The paper introduces FLAIR, a full-duplex spoken dialogue model that simulates human-like internal reasoning by performing latent thinking concurrently with speech perception. FLAIR uses a recursive latent embedding mechanism and an Evidence Lower Bound-based objective for efficient supervised finetuning, enabling continuous reasoning without additional latency. Experiments on speech benchmarks demonstrate that FLAIR achieves competitive results and robustly handles conversational dynamics in full-duplex interactions.

Key Contribution

Mimicking human cognition, FLAIR lets dialogue models "think while listening," boosting performance without adding latency.

Abstract

During conversational interactions, humans subconsciously engage in concurrent thinking while listening to a speaker. Although this internal cognitive processing may not always manifest as explicit linguistic structures, it is instrumental in formulating high-quality responses. Inspired by this cognitive phenomenon, we propose a novel Full-duplex LAtent and Internal Reasoning method named FLAIR that conducts latent thinking simultaneously with speech perception. Unlike conventional"thinking"mechanisms in NLP, which require post-hoc generation, our approach aligns seamlessly with spoken dialogue systems: during the user's speaking phase, it recursively feeds the latent embedding output from the previous step into the next step, enabling continuous reasoning that strictly adheres to causality without introducing additional latency. To enable this latent reasoning, we design an Evidence Lower Bound-based objective that supports efficient supervised finetuning via teacher forcing, circumventing the need for explicit reasoning annotations. Experiments demonstrate the effectiveness of this think-while-listening design, which achieves competitive results on a range of speech benchmarks. Furthermore, FLAIR robustly handles conversational dynamics and attains competitive performance on full-duplex interaction metrics.

Natural Language Processing Reasoning & Chain-of-Thought Speech & Audio

Citation Metrics

Citations0

Influential citations0

References76

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

Related Papers