Mar 11, 2026arXiv:2603.11205

Can LLMs Help Localize Fake Words in Partially Fake Speech?

Thomas Thebaud, Zexin Cai, S. Khudanpur, Dan Povey, Leibny Paola Garc'ia-Perera, Matthew Wiesner, Nicholas Andrews

AI Summary

This paper explores using text-trained LLMs to identify manipulated words within partially fake speech by framing it as a next-token prediction task. The study reveals that the LLM leverages editing-style patterns, specifically word-level polarity substitutions, learned from the training data to detect fake words. However, the model's reliance on these specific patterns limits its generalization to unseen editing styles.

Key Contribution

LLMs can spot fake words in speech by recognizing common editing patterns, but this reliance on learned biases hinders generalization to new manipulation techniques.

Abstract

Large language models (LLMs), trained on large-scale text, have recently attracted significant attention for their strong performance across many tasks. Motivated by this, we investigate whether a text-trained LLM can help localize fake words in partially fake speech, where only specific words within a speech are edited. We build a speech LLM to perform fake word localization via next token prediction. Experiments and analyses on AV-Deepfake1M and PartialEdit indicates that the model frequently leverages editing-style pattern learned from the training data, particularly word-level polarity substitutions for those two databases we discussed, as cues for localizing fake words. Although such particular patterns provide useful information in an in-domain scenario, how to avoid over-reliance on such particular pattern and improve generalization to unseen editing styles remains an open question.

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References41

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Can LLMs Help Localize Fake Words in Partially Fake Speech?

Related Papers