Search papers, labs, and topics across Lattice.
ByteDance Seed
4
1
5
0
Scaling visual preference optimization hinges on data quality, as demonstrated by the finding that standard DPO suffices for a sufficiently large and clean dataset, while a novel Poly-DPO objective is crucial for noisy data.
Seedance 2.0 leapfrogs existing models by unifying multi-modal inputs (text, image, audio, video) into a single architecture for generating high-quality, longer-duration audio-video content.
RLHF can be made more stable and effective by explicitly verifying and reinforcing policy improvements against a historical baseline, rather than relying solely on instantaneous reward signals.
A reward model trained on spatial relationship preferences beats proprietary models at spatial understanding in text-to-image generation, and unlocks better RL-based image generation.