Search papers, labs, and topics across Lattice.
1
0
3
4
Skip the expensive reward model: RewardFlow distills sparse task rewards into dense, state-level signals by propagating credit through the topology of LLM reasoning trajectories.