Search papers, labs, and topics across Lattice.
The paper introduces FLAIR, a full-duplex spoken dialogue model that simulates human-like internal reasoning by performing latent thinking concurrently with speech perception. FLAIR uses a recursive latent embedding mechanism and an Evidence Lower Bound-based objective for efficient supervised finetuning, enabling continuous reasoning without additional latency. Experiments on speech benchmarks demonstrate that FLAIR achieves competitive results and robustly handles conversational dynamics in full-duplex interactions.
Mimicking human cognition, FLAIR lets dialogue models "think while listening," boosting performance without adding latency.
During conversational interactions, humans subconsciously engage in concurrent thinking while listening to a speaker. Although this internal cognitive processing may not always manifest as explicit linguistic structures, it is instrumental in formulating high-quality responses. Inspired by this cognitive phenomenon, we propose a novel Full-duplex LAtent and Internal Reasoning method named FLAIR that conducts latent thinking simultaneously with speech perception. Unlike conventional"thinking"mechanisms in NLP, which require post-hoc generation, our approach aligns seamlessly with spoken dialogue systems: during the user's speaking phase, it recursively feeds the latent embedding output from the previous step into the next step, enabling continuous reasoning that strictly adheres to causality without introducing additional latency. To enable this latent reasoning, we design an Evidence Lower Bound-based objective that supports efficient supervised finetuning via teacher forcing, circumventing the need for explicit reasoning annotations. Experiments demonstrate the effectiveness of this think-while-listening design, which achieves competitive results on a range of speech benchmarks. Furthermore, FLAIR robustly handles conversational dynamics and attains competitive performance on full-duplex interaction metrics.