Search papers, labs, and topics across Lattice.
University of Science and Technology of China, NLPR, MAIS, Institute of Automation, Chinese Academy of Sciences
1
0
3
6
Achieve state-of-the-art results in agentic knowledge base question answering by distilling gold-action policies into on-policy student rollouts, bridging the gap between sparse rewards and weakly supervised intermediate actions.