Search papers, labs, and topics across Lattice.
Carleton University {lichang93, xuechao19, xiaodong.he}@jd.com {zhihaoxu, yarenzhang}@cmail.carleton.ca
1
0
2
1
Achieve better token efficiency in LLM policy optimization by using a novel FiberPO objective whose Jacobian is block-diagonal over trajectories and reduces to identity on-policy.