Mar 8, 2026arXiv:2603.07599

StyleBench: Evaluating Speech Language Models on Conversational Speaking Style Control

Haishu Zhao, Aokai Hao, Yuan Ge, Zhenqiang Hong, Tong Xiao, Jingbo Zhu

AI Summary

StyleBench, a new multi-turn dialogue benchmark, is introduced to evaluate the style intensity control of speech language models (SLMs) across emotion, speed, volume, and pitch. Experiments using StyleBench reveal performance gaps between current SLMs and omni language models (OLMs) in controlling speaking style intensity during conversations. The analysis identifies potential reasons for these gaps and suggests avenues for future research to improve style control in SLMs.

Key Contribution

SLMs still lag behind omni language models in multi-turn conversational style control, as revealed by the new StyleBench benchmark.

Abstract

Speech language models (SLMs) have significantly extended the interactive capability of text-based Large Language Models (LLMs) by incorporating paralinguistic information. For more realistic interactive experience with customized styles, current SLMs have managed to interpret and control speaking style intensity from user prompts during the dialogue process. However, there remains a lack of systematic benchmarks that quantifies and evaluates the style intensity control ability in conversations. In this paper, we propose StyleBench, a multi-turn dialogue benchmark for comprehensively evaluating the style intensity control ability across four dimensions: emotion, speed, volume, and pitch. Our results reveal the performance gaps between leading SLMs and omni language models (OLMs), suggesting the underlying reasons and promising approaches for future exploration.

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

StyleBench: Evaluating Speech Language Models on Conversational Speaking Style Control

Related Papers