Daffodil International UniversityUHUniversity at AlbanyJul 21, 2025arXiv:2507.15286

Beyond Easy Wins: A Text Hardness-Aware Benchmark for LLM-generated Text Detection

Navid Ayoobi, Sadat Shahriar, Arjun Mukherjee

AI Summary

The paper introduces SHIELD, a benchmark for evaluating AI text detectors that emphasizes reliability and stability across diverse domains and adversarial scenarios, addressing limitations of existing benchmarks that focus on AUROC. They propose a unified evaluation metric incorporating reliability and stability, crucial for practical deployment. The authors also develop a post-hoc, model-agnostic "humanification" framework with a controllable hardness parameter to generate challenging AI-generated text samples.

Key Contribution

AI text detectors that ace standard benchmarks often crumble when faced with subtly human-like AI-generated text, exposing a critical gap in real-world readiness.

Abstract

We present a novel evaluation paradigm for AI text detectors that prioritizes real-world and equitable assessment. Current approaches predominantly report conventional metrics like AUROC, overlooking that even modest false positive rates constitute a critical impediment to practical deployment of detection systems. Furthermore, real-world deployment necessitates predetermined threshold configuration, making detector stability (i.e. the maintenance of consistent performance across diverse domains and adversarial scenarios), a critical factor. These aspects have been largely ignored in previous research and benchmarks. Our benchmark, SHIELD, addresses these limitations by integrating both reliability and stability factors into a unified evaluation metric designed for practical assessment. Furthermore, we develop a post-hoc, model-agnostic humanification framework that modifies AI text to more closely resemble human authorship, incorporating a controllable hardness parameter. This hardness-aware approach effectively challenges current SOTA zero-shot detection methods in maintaining both reliability and stability. (Data and code: https://github.com/navid-aub/SHIELD-Benchmark)

Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations1

Influential citations0

References50

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

Beyond Easy Wins: A Text Hardness-Aware Benchmark for LLM-generated Text Detection

Related Papers