Search papers, labs, and topics across Lattice.
1
0
2
By strategically resampling from deep, recoverable states ("pivots") within unsuccessful trajectories, DDE drastically improves LLM reinforcement learning compared to methods that oversample from the root or blindly disperse budgets.