Search papers, labs, and topics across Lattice.
1
0
2
Achieve better token efficiency in LLM policy optimization by using a novel FiberPO objective whose Jacobian is block-diagonal over trajectories and reduces to identity on-policy.