Search papers, labs, and topics across Lattice.
2
0
5
2
Turns out, you can cut critical errors in VLM-generated image editing instructions in half with a clever two-stage training pipeline, leading to SOTA editing performance.
Forget end-to-end training and unstable RL: this staged learning approach with a novel Bias-DPO objective lets vision-language models plan physically plausible actions better than GPT-4o.