CornellNational Tutoring ObservatoryFeb 23, 2026arXiv:2602.20061

Can You Tell It's AI? Human Perception of Synthetic Voices in Vishing Scenarios

Zoha Hayat Bhatti, Zoha Hayat Bhatti, Bakhtawar Ahtisham, Seemal Tausif, Seemal Tausif, Niklas George, Niklas George, Nida ul Habib Bajwa, N. Bajwa, Mobin Javed, Mobin Javed

AI Summary

This paper investigates human ability to distinguish between AI-generated and human-recorded voices in vishing scenarios. Through a controlled online study with 22 participants evaluating 16 audio clips, the authors found that participants performed significantly below chance (37.5% accuracy) in classifying voices as human or AI. Signal Detection Theory analysis confirmed an inability to reliably discriminate between synthetic and human voices, highlighting the ineffectiveness of relying on vocal heuristics for authenticity judgments.

Key Contribution

Humans can't reliably tell the difference between AI-generated and human voices in vishing scams, even when they're confident they can.

Abstract

Large Language Models and commercial speech synthesis systems now enable highly realistic AI-generated voice scams (vishing), raising urgent concerns about deception at scale. Yet it remains unclear whether individuals can reliably distinguish AI-generated speech from human-recorded voices in realistic scam contexts and what perceptual strategies underlie their judgments. We conducted a controlled online study in which 22 participants evaluated 16 vishing-style audio clips (8 AI-generated, 8 human-recorded) and classified each as human or AI while reporting confidence. Participants performed poorly: mean accuracy was 37.5%, below chance in a binary classification task. At the stimulus level, misclassification was bidirectional: 75% of AI-generated clips were majority-labeled as human, while 62.5% of human-recorded clips were majority-labeled as AI. Signal Detection Theory analysis revealed near-zero discriminability (d'approx 0), indicating inability to reliably distinguish synthetic from human voices rather than simple response bias. Qualitative analysis of 315 coded excerpts revealed reliance on paralinguistic and emotional heuristics, including pauses, filler words, vocal variability, cadence, and emotional expressiveness. However, these surface-level cues traditionally associated with human authenticity were frequently replicated by AI-generated samples. Misclassifications were often accompanied by moderate to high confidence, suggesting perceptual miscalibration rather than uncertainty. Together, our findings demonstrate that authenticity judgments based on vocal heuristics are unreliable in contemporary vishing scenarios. We discuss implications for security interventions, user education, and AI-mediated deception mitigation.

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness Speech & Audio

Citation Metrics

Citations0

Influential citations0

References32

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Can You Tell It's AI? Human Perception of Synthetic Voices in Vishing Scenarios

Related Papers