Search papers, labs, and topics across Lattice.
2
0
5
Flow-based imitation learning can be significantly improved by distilling both rewards and actions on-policy, enabling more robust and generalizable policies, especially with limited or noisy demonstrations.
Forget end-to-end training and unstable RL: this staged learning approach with a novel Bias-DPO objective lets vision-language models plan physically plausible actions better than GPT-4o.