CASMar 16, 2026arXiv:2603.14968

Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

Zhuoshang Wang, Yubing Ren, Yanan Cao, Fang Fang, Xiaoxue Li, Li Guo

AI Summary

This paper introduces TTP-Detect, a black-box framework for non-intrusive, third-party verification of LLM watermarks, addressing the limitations of existing secret-key schemes that require access to keys or provider-specific detectors. TTP-Detect decouples detection from injection by using a proxy model to amplify watermark signals and employing relative measurements to assess the alignment of query text with watermarked distributions. Experiments show TTP-Detect achieves strong detection performance and robustness across various watermarking schemes, datasets, and models.

Key Contribution

Finally, a watermark detection method that doesn't require the model provider's secret sauce, enabling independent auditing of LLM provenance.

Abstract

While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring access to keys or provider-side scheme-specific detectors for verification. This dependency creates a fundamental barrier for real-world governance, as independent auditing becomes impossible without compromising model security or relying on the opaque claims of service providers. To resolve this dilemma, we introduce TTP-Detect, a pioneering black-box framework designed for non-intrusive, third-party watermark verification. By decoupling detection from injection, TTP-Detect reframes verification as a relative hypothesis testing problem. It employs a proxy model to amplify watermark-relevant signals and a suite of complementary relative measurements to assess the alignment of the query text with watermarked distributions. Extensive experiments across representative watermarking schemes, datasets and models demonstrate that TTP-Detect achieves superior detection performance and robustness against diverse attacks.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

Related Papers