Search papers, labs, and topics across Lattice.
3
0
6
By jointly processing candidate solutions, the Multi-Sequence Verifier slashes LLM latency in parallel test-time scaling by 50% while maintaining accuracy.
Model-free reinforcement learning can achieve asymptotic optimality: AIQI learns without environment models by directly inducing action-value functions.
Decoupling correctness from checkability in prover-verifier games eliminates the legibility tax, enabling more reliable verification of LLM outputs.