Search papers, labs, and topics across Lattice.
Shanghai AI Lab, Nanyang Technological University
6
23
7
32
Text-to-image flow models can achieve superior preference alignment by augmenting the condition space, creating a "dense" reward mapping that better captures inter-sample relationships.
Forget textual rules and coarse embeddings: a multimodal reward model that directly compares rendered visuals unlocks significant gains in vision-to-code RL.
Hallucinations in RL-based image editing and generation are tamed with FIRM, a new framework that trains robust reward models on curated datasets to provide more accurate guidance.
Diffusion models can now reason their way through complex spatial tasks with near-perfect accuracy, thanks to a new framework that unlocks chain-of-thought reasoning within the latent space.
By mimicking how humans use visual anchors, ChartVSR lets models iteratively correct their own visual perception errors, leading to more accurate chart parsing.
Ditch the clunky pipelines: SongGen generates complete songs from text in a single pass, offering unprecedented control over musical elements and voice cloning.