PKUApr 30, 2026arXiv:2604.27666

VOW: Verifiable and Oblivious Watermark Detection for Large Language Models

Xiaokun Luan, Yihao Zhang, Pengcheng Su, Feiran Lei, Meng Sun

AI Summary

This paper introduces VOW, a novel watermarking protocol for LLMs that enables privacy-preserving and cryptographically verifiable watermark detection. VOW formulates detection as a secure two-party computation using a Verifiable Oblivious Pseudorandom Function (VOPRF), allowing users to verify the integrity of the watermark detection without revealing the text to the provider. Experiments demonstrate VOW's practicality for short texts and reveal vulnerabilities in existing watermarks against paraphrasing attacks.

Key Contribution

Watermarking LLMs doesn't have to sacrifice privacy: VOW lets you verify machine-generated text without revealing the content to a central authority.

Abstract

Large Language Model (LLM) watermarking is crucial for establishing the provenance of machine-generated text, but most existing methods rely on a centralized trust model. This model forces users to reveal potentially sensitive text to a provider for detection and offers no way to verify the integrity of the result. While asymmetric schemes have been proposed to address these issues, they are either impractical for short texts or lack formal guarantees linking watermark insertion and detection. We propose VOW, a new protocol that achieves both privacy-preserving and cryptographically verifiable watermark detection with high efficiency. Our approach formulates detection as a secure two-party computation problem, instantiating the watermark's core logic with a Verifiable Oblivious Pseudorandom Function (VOPRF). This allows the user and provider to perform detection without the user's text being revealed, while the provider's result is verifiable. Our comprehensive evaluation shows that VOW is practical for short texts and provides a crucial reassessment of watermark robustness against modern paraphrasing attacks.

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References41

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

VOW: Verifiable and Oblivious Watermark Detection for Large Language Models

Related Papers