Search papers, labs, and topics across Lattice.
2
0
5
7
Even when a computer-use agent succeeds once, inconsistent task specification and variable agent behavior can tank its reliability.
Injecting demonstrations with a carefully annealed probability can drastically improve exploration in RLVR, even for tasks requiring novel reasoning or domain-specific knowledge.