Search papers, labs, and topics across Lattice.
University of Science and Technology of China
7
0
10
AnchorEdit achieves state-of-the-art performance in multi-turn image editing by maintaining subject identity across 10+ interactions, revolutionizing iterative design workflows.
Navigating with fewer than 8 VLM calls per episode, Goal2Pixel redefines efficiency in vision-language navigation tasks.
Bidirectional interaction between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables a unified multimodal model to achieve spatial intelligence beyond general visual competence.
Bridging the gap between human manipulation and robotic control, JoyAI-RA unlocks enhanced cross-embodiment behavior learning through multi-source pretraining.
Achieve 200x faster immersed boundary flow simulations without sacrificing accuracy by learning to correct coarse-grained physics simulations with a neural network.
Unlock multimodal interleaved generation in existing vision-language models without large interleaved datasets using a novel reinforcement learning approach with hybrid rewards.
Forget monolithic action decoders: AtomicVLA's skill-guided mixture-of-experts unlocks significant gains in long-horizon robotic manipulation and continual learning.