Search papers, labs, and topics across Lattice.
2
0
4
1
Model-free reinforcement learning can achieve asymptotic optimality: AIQI learns without environment models by directly inducing action-value functions.
Decoupling correctness from checkability in prover-verifier games eliminates the legibility tax, enabling more reliable verification of LLM outputs.