Search papers, labs, and topics across Lattice.
3
0
8
Don't let valuable steps in failed trajectories go unnoticed: GraphGPO leverages state-transition graphs for fine-grained credit assignment in agentic RL, boosting performance and efficiency.
Flow-based imitation learning can be significantly improved by distilling both rewards and actions on-policy, enabling more robust and generalizable policies, especially with limited or noisy demonstrations.
LLMs can be forced to generalize beyond initial constraints by actively searching for adversarial test cases that expose logical divergences in generated code.