Junyang Lin

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (4)Tool Use & Agents (3)Code Generation & Program Synthesis (2)Eval Frameworks & Benchmarks (2)

Frequent co-authors

Dayiheng Liu (2)Shuai Bai (2)Ruizhe Chen (2)Shixuan Liu (2)

Papers (7)

May 28, 2026

Qiuyue Wang +42May 28, 2026·also Adelaide, PolyU

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

One model to control them all: Qwen-VLA achieves impressive zero-shot generalization across diverse robotic tasks and embodiments by unifying vision-language-action modeling.

Qiuyue Wang, Mingsheng Li, Jian Guan +40

Multimodal Models Robotics & Embodied AI Tool Use & Agents

May 25, 2026

DAMOMay 25, 2026·also Tsinghua AI, HIT, HKU, SJTU

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

Forget hand-crafted benchmarks: CUA-Gym's auto-generated training data lets computer-use agents crush existing open-source models on real-world tasks.

Bowen Wang, Dunjie Lu, Junli Wang +10

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Mar 25, 2026

Yu-Hao Yang +8Mar 25, 2026·also SJTU

GenMask: Adapting DiT for Segmentation via Direct Mask Generation

Ditch the feature extraction pipeline: GenMask directly generates segmentation masks with a diffusion transformer, achieving SOTA results by harmonizing mask and image generation in a single model.

Yu-Hao Yang, Xianwei Zhuang, Yuxuan Cai +6

Architecture Design (Transformers, SSMs, MoE)Computer Vision

Mar 18, 2026

Tsinghua AIMar 18, 2026·also DAMO, SJTU

Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos

Forget real-world video datasets: training VLMs on just 7.7K synthetic videos with temporal primitives beats 165K real-world examples, unlocking surprisingly effective transfer learning for video reasoning.

Songtao Jiang, Sibo Song, Chenyi Zhou +7

Computer Vision Data Curation & Synthetic Data Multimodal Models

Mar 17, 2026

Tsinghua AIMar 17, 2026·also DAMO

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

Multi-hop data synthesis using HopChain boosts VLM performance across a wide range of tasks, with gains of over 50 points in accuracy for ultra-long-context reasoning.

Shenzhi Wang, Shixuan Liu, Chang Gao +5

Data Curation & Synthetic Data Multimodal Models Reasoning & Chain-of-Thought

Mar 11, 2026

Tsinghua AIMar 11, 2026·also DAMO, NanKai University, NJU, Scale +1

CodePercept: Code-Grounded Visual STEM Perception for MLLMs

Forget scaling reasoning – this work shows that scaling visual perception using code-grounded data is the real key to unlocking MLLMs' STEM abilities.

Tongkun Guan, Jianqiang Wan, Mingkun Yang +12

Code Generation & Program Synthesis Multimodal Models Reasoning & Chain-of-Thought

Feb 15, 2026

Tsinghua AIFeb 15, 2026·also DAMO, BUPT

Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents

A new family of GUI agents, GUI-Owl-1.5, leapfrogs existing open-source models on 20+ GUI benchmarks, proving that multi-platform, real-time GUI automation is now within reach.

Haiyang Xu, Haiyang Xu, Xi Zhang +32

Eval Frameworks & Benchmarks Open-Source Models & Weights Tool Use & Agents

Search

Junyang Lin

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (7)