Search papers, labs, and topics across Lattice.
JD Explore Academy
1
0
2
3
Achieve better token efficiency in LLM policy optimization by using a novel FiberPO objective whose Jacobian is block-diagonal over trajectories and reduces to identity on-policy.