Search papers, labs, and topics across Lattice.
1
0
3
LLMs can turn sparse rewards into dense training signals for RL agents, achieving comparable performance with significantly higher sample efficiency.