ANLNuclear Science and Engineering DivisionUniversity of NevadaApr 6, 2026arXiv:2604.05242

XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts

Jiahao Xu, Rui Hu, O. Kotevska, Olivera Kotevska, Zikai Zhang

AI Summary

The paper introduces XMark, a new multi-bit watermarking scheme for LLM-generated text that improves the trade-off between decoding accuracy and text quality. XMark's encoder minimizes logit distribution distortion during watermarked token generation, while its decoder is optimized for recovering messages from short text sequences. Experiments across various tasks demonstrate that XMark outperforms existing methods in decoding accuracy and text quality, especially when the generated text is short.

Key Contribution

Existing LLM watermarking schemes crumble when text is short, but XMark maintains high decoding accuracy and text quality even with limited tokens.

Abstract

Multi-bit watermarking has emerged as a promising solution for embedding imperceptible binary messages into Large Language Model (LLM)-generated text, enabling reliable attribution and tracing of malicious usage of LLMs. Despite recent progress, existing methods still face key limitations: some become computationally infeasible for large messages, while others suffer from a poor trade-off between text quality and decoding accuracy. Moreover, the decoding accuracy of existing methods drops significantly when the number of tokens in the generated text is limited, a condition that frequently arises in practical usage. To address these challenges, we propose \textsc{XMark}, a novel method for encoding and decoding binary messages in LLM-generated texts. The unique design of \textsc{XMark}'s encoder produces a less distorted logit distribution for watermarked token generation, preserving text quality, and also enables its tailored decoder to reliably recover the encoded message with limited tokens. Extensive experiments across diverse downstream tasks show that \textsc{XMark} significantly improves decoding accuracy while preserving the quality of watermarked text, outperforming prior methods. The code is at https://github.com/JiiahaoXU/XMark.

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References26

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts

Related Papers