Search papers, labs, and topics across Lattice.
1
0
3
9
Forget end-to-end training and unstable RL: this staged learning approach with a novel Bias-DPO objective lets vision-language models plan physically plausible actions better than GPT-4o.