Search papers, labs, and topics across Lattice.
The paper addresses the "legibility tax" in prover-verifier games, where models sacrifice accuracy for checkability. They propose decoupling correctness from checkability by introducing a "translator" model that converts a pre-trained "solver" model's output into a checkable format. By training the translator separately, they maintain the solver's accuracy while improving checkability.
Decoupling correctness from checkability in prover-verifier games eliminates the legibility tax, enabling more reliable verification of LLM outputs.
As large language models become increasingly capable, it is critical that their outputs can be easily checked by less capable systems. Prover-verifier games can be used to improve checkability of model outputs, but display a degradation in accuracy compared to a baseline trained only to maximize correctness -- a phenonemon named legibility tax. We propose a solution by decoupling the correctness from the checkability condition and instead training a"translator"model that turns a fixed solver model's solution into a checkable form. This allows us to first train the solver to maximize correctness, and then train the translator to translate the solver into a checkable form while retaining the solver's answer. To accommodate this new objective of translation, we formulate a decoupled prover-verifier game where the equilibria correspond to faithful and checkable translators.