SkoltechMar 11, 2026arXiv:2603.10725

Towards Robust Speech Deepfake Detection via Human-Inspired Reasoning

Artem Dvirniak, E. Kushnir, Dmitrii Tarasov, A. Iudin, O. Kiriukhin, Mikhail Aleksandrovich Pautov, Dmitrii Korzh, Oleg Y. Rogov

AI Summary

This paper introduces HIR-SDD, a novel speech deepfake detection framework that leverages Large Audio Language Models (LALMs) and chain-of-thought reasoning. A new human-annotated dataset was created to facilitate the chain-of-thought reasoning process, enabling the model to provide justifications for its classifications. Experiments demonstrate that HIR-SDD achieves improved performance in speech deepfake detection while also offering interpretable explanations.

Key Contribution

Speech deepfake detection gets a reasoning upgrade: HIR-SDD uses chain-of-thought prompting with Large Audio Language Models to not only detect fakes but also explain *why* it thinks they're fake.

Abstract

The modern generative audio models can be used by an adversary in an unlawful manner, specifically, to impersonate other people to gain access to private information. To mitigate this issue, speech deepfake detection (SDD) methods started to evolve. Unfortunately, current SDD methods generally suffer from the lack of generalization to new audio domains and generators. More than that, they lack interpretability, especially human-like reasoning that would naturally explain the attribution of a given audio to the bona fide or spoof class and provide human-perceptible cues. In this paper, we propose HIR-SDD, a novel SDD framework that combines the strengths of Large Audio Language Models (LALMs) with the chain-of-thought reasoning derived from the novel proposed human-annotated dataset. Experimental evaluation demonstrates both the effectiveness of the proposed method and its ability to provide reasonable justifications for predictions.

Interpretability & Mechanistic Interp Red-Teaming & Adversarial Robustness Speech & Audio

Citation Metrics

Citations0

Influential citations0

References54

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Towards Robust Speech Deepfake Detection via Human-Inspired Reasoning

Related Papers