Search papers, labs, and topics across Lattice.
1
0
3
2
Scaffolding LLMs with hints during RL training can boost both initial accuracy *and* long-horizon reasoning performance, but only if the hints mimic student behavior and are gradually withdrawn.