Search papers, labs, and topics across Lattice.
Beijing Institute of Technology
3
0
6
Self-conditioning on verified trajectories boosts reinforcement learning performance by over 8%, revealing the power of internal feedback in credit assignment.
PathRouter reduces reliance on shortcuts in reinforcement learning, leading to more reliable and contextually rich decision-making in language-model agents.
AdaPLD achieves up to 3.10x faster decoding by intelligently combining lexical and semantic strategies for token retrieval and hypothesis generation.