Search papers, labs, and topics across Lattice.
ByteDance Seed
2
1
3
Image editing models can be significantly improved by replacing monolithic reward scores with a chain-of-thought reasoning verifier that breaks down instructions into distinct principles and evaluates the edited image against each.
Scaling visual preference optimization hinges on data quality, as demonstrated by the finding that standard DPO suffices for a sufficiently large and clean dataset, while a novel Poly-DPO objective is crucial for noisy data.