Feb 26, 2026arXiv:2602.23248

Mitigating Legibility Tax with Decoupled Prover-Verifier Games

Yegon Kim, Yegon Kim, Juho Lee, Juho Lee

AI Summary

The paper addresses the "legibility tax" in prover-verifier games, where models sacrifice accuracy for checkability. They propose decoupling correctness from checkability by introducing a "translator" model that converts a pre-trained "solver" model's output into a checkable format. By training the translator separately, they maintain the solver's accuracy while improving checkability.

Key Contribution

Decoupling correctness from checkability in prover-verifier games eliminates the legibility tax, enabling more reliable verification of LLM outputs.

Abstract

As large language models become increasingly capable, it is critical that their outputs can be easily checked by less capable systems. Prover-verifier games can be used to improve checkability of model outputs, but display a degradation in accuracy compared to a baseline trained only to maximize correctness -- a phenonemon named legibility tax. We propose a solution by decoupling the correctness from the checkability condition and instead training a"translator"model that turns a fixed solver model's solution into a checkable form. This allows us to first train the solver to maximize correctness, and then train the translator to translate the solver into a checkable form while retaining the solver's answer. To accommodate this new objective of translation, we formulate a decoupled prover-verifier game where the equilibria correspond to faithful and checkable translators.

Reasoning & Chain-of-Thought RLHF & Preference Learning Scalable Oversight & Alignment Theory

Citation Metrics

Citations0

Influential citations0

References15

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Mitigating Legibility Tax with Decoupled Prover-Verifier Games

Related Papers