Search papers, labs, and topics across Lattice.
This paper systematically analyzes the prevalence of verbal tics鈥攔epetitive linguistic patterns鈥攊n eight state-of-the-art LLMs using a novel Verbal Tic Index (VTI) across 10,000 prompts in English and Chinese. The study reveals significant inter-model variation in VTI scores, with Gemini 3.1 Pro exhibiting the highest and DeepSeek V3.2 the lowest, and demonstrates that tics accumulate in multi-turn conversations and are amplified in subjective tasks. Human evaluation confirms a strong inverse correlation between sycophancy and perceived naturalness, highlighting the "alignment tax" of current training paradigms.
LLMs are drowning in verbal tics鈥攕ycophantic openers and pseudo-empathetic affirmations鈥攁nd this "alignment tax" significantly reduces perceived naturalness.
As Large Language Models (LLMs) continue to evolve through alignment techniques such as Reinforcement Learning from Human Feedback (RLHF) and Constitutional AI, a growing and increasingly conspicuous phenomenon has emerged: the proliferation of verbal tics -- repetitive, formulaic linguistic patterns that pervade model outputs. These range from sycophantic openers ("That's a great question!","Awesome!") to pseudo-empathetic affirmations ("I completely understand your concern","I'm right here to catch you") and overused vocabulary ("delve","tapestry","nuanced"). In this paper, we present a systematic analysis of the verbal tic phenomenon across eight state-of-the-art LLMs: GPT-5.4, Claude Opus 4.7, Gemini 3.1 Pro, Grok 4.2, Doubao-Seed-2.0-pro, Kimi K2.5, DeepSeek V3.2, and MiMo-V2-Pro. Utilizing a custom evaluation framework for standardized API-based evaluation, we assess 10,000 prompts across 10 task categories in both English and Chinese, yielding 160,000 model responses. We introduce the Verbal Tic Index (VTI), a composite metric quantifying tic prevalence, and analyze its correlation with sycophancy, lexical diversity, and human-perceived naturalness. Our findings reveal significant inter-model variation: Gemini 3.1 Pro exhibits the highest VTI (0.590), while DeepSeek V3.2 achieves the lowest (0.295). We further demonstrate that verbal tics accumulate over multi-turn conversations, are amplified in subjective tasks, and show distinct cross-lingual patterns. Human evaluation (N = 120) confirms a strong inverse relationship between sycophancy and perceived naturalness (r = -0.87, p<0.001). These results underscore the"alignment tax"of current training paradigms and highlight the urgent need for more authentic human-AI interaction frameworks.