Search papers, labs, and topics across Lattice.
Shanghai AI Laboratory, Shanghai AI Lab, Nanyang Technological University
8
23
12
35
Streaming spatial intelligence remains a significant hurdle for multimodal LLMs, with top models trailing human experts by 27 points in allocentric mapping tasks.
LLM agents can achieve a remarkable 82 wins out of 402 games by leveraging structured pseudocode, transforming how they interact with skill libraries.
Forget textual rules and coarse embeddings: a multimodal reward model that directly compares rendered visuals unlocks significant gains in vision-to-code RL.
Text-to-image flow models can achieve superior preference alignment by augmenting the condition space, creating a "dense" reward mapping that better captures inter-sample relationships.
Hallucinations in RL-based image editing and generation are tamed with FIRM, a new framework that trains robust reward models on curated datasets to provide more accurate guidance.
Diffusion models can now reason their way through complex spatial tasks with near-perfect accuracy, thanks to a new framework that unlocks chain-of-thought reasoning within the latent space.
By mimicking how humans use visual anchors, ChartVSR lets models iteratively correct their own visual perception errors, leading to more accurate chart parsing.
Ditch the clunky pipelines: SongGen generates complete songs from text in a single pass, offering unprecedented control over musical elements and voice cloning.