Xiaodan Liang

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Robotics & Embodied AI (7)Multimodal Models (4)Computer Vision (4)Open-Source Models & Weights (1)

Frequent co-authors

Liang Ma (3)Meng Cao (2)Ivan Laptev (2)Kaidong Zhang (1)

Papers (7)

Apr 7, 2026

Kaidong Zhang +253w ago

A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model

Achieve state-of-the-art robot manipulation success rates while slashing inference costs by up to 72% with A1, a fully open-source VLA framework that adaptively truncates computation.

Kaidong Zhang, Jian Zhang, Rongtao Xu +23

Multimodal Models Open-Source Models & Weights Robotics & Embodied AI

Mar 30, 2026

Yu Sun +18Mar 30, 2026

ManipArena: Comprehensive Real-world Evaluation of Reasoning-Oriented Generalist Robot Manipulation

Current robot manipulation benchmarks fail to capture the messy reality of real-world deployment, so this work introduces a new benchmark, ManipArena, to close the sim2real gap.

Yu Sun, Meng Cao, Ping Yang +16

Eval Frameworks & Benchmarks Robotics & Embodied AI World Models & Planning

Mar 16, 2026

AnyCrowd: Instance-Isolated Identity-Pose Binding for Arbitrary Multi-Character Animation

Achieve controllable multi-character animation with arbitrary numbers of characters by preventing identity entanglement and improving identity-pose binding via instance-isolated latent representations and decoupled attention.

Zhenyu Xie, Ji Xia, Michael Kampffmeyer +5

Computer Vision Robotics & Embodied AI

Mar 10, 2026

Mar 10, 2026·also SYSU

Implicit Geometry Representations for Vision-and-Language Navigation from Web Videos

Unlock the power of web videos for embodied AI: implicit geometry representations let agents learn to navigate from real-world room tours without relying on fragile 3D reconstruction.

Haihong Hao, Liang Ma, Kamila Zhumakhanova +3

Computer Vision Multimodal Models Robotics & Embodied AI

Mar 9, 2026

Mar 9, 2026·also CAS, INRIA, SJTU, SYSU +1

Choose What to Observe: Task-Aware Semantic-Geometric Representations for Visuomotor Policy

Visuomotor policies can learn to ignore distracting visual variations simply by preprocessing raw RGB images into task-aware, semantic-geometric representations *before* feeding them to the policy.

Haoran Ding, Liang Ma, Yaxun Yang +3

Computer Vision Multimodal Models Robotics & Embodied AI

Mar 8, 2026

Mar 8, 2026·also SYSU

AtomicVLA: Unlocking the Potential of Atomic Skill Learning in Robots

Forget monolithic action decoders: AtomicVLA's skill-guided mixture-of-experts unlocks significant gains in long-horizon robotic manipulation and continual learning.

Likui Zhang, Tao Tang, Xiuwei Chen +7

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Feb 24, 2026

Feb 24, 2026·also Lenovo

WildGHand: Learning Anti-Perturbation Gaussian Hand Avatars from Monocular In-the-Wild Videos

Reconstructing realistic 3D hand avatars from messy, real-world video just got a whole lot better thanks to a new method that explicitly models and suppresses visual "noise" like motion blur and object interactions.

Hanhui Li, Xuan Huang, Wanquan Liu +4