Zhuang Liu

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (5)Computer Vision (3)Open-Source Models & Weights (2)Inference & Quantization (1)

Frequent co-authors

Yida Yin (2)Xingyu Fu (2)Boya Zeng (1)Tianze Luo (1)

Papers (6)

Jun 9, 2026

6d ago

i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models

i1 not only matches the performance of leading text-to-image models but also sets a new standard for fully open models, outperforming the best by nearly 30 percentage points.

Boya Zeng, Tianze Luo, Shu Pu +4

Computer Vision Multimodal Models Open-Source Models & Weights

Jun 8, 2026

1w ago·also Columbia, Harvard, LLNL, Princeton +1

End-to-End Context Compression at Scale

LCLMs redefine the efficiency of long-context inference, achieving superior compression without sacrificing model quality.

Ang Li, Sean McLeish, Haozhe Chen +12

Inference & Quantization Scaling Laws & Emergent Abilities

Jun 4, 2026

1w ago·also Waterloo

WorldBench: A Challenging and Visually Diverse Multimodal Reasoning Benchmark

Even the top-performing MLLMs struggle with visual reasoning, achieving only 64% accuracy on a benchmark designed to reflect real-world diversity.

Yida Yin, H. Krishnakumar, Chung Peng Lee +9

Eval Frameworks & Benchmarks Multimodal Models

May 28, 2026

Zhipeng Cai +52w ago

VLM3: Vision Language Models Are Native 3D Learners

Forget complex architectures and task-specific designs: VLMs are already native 3D learners with the right training recipe.

Zhipeng Cai, Zhuang Liu, Yunyang Xiong +3

Computer Vision Multimodal Models

Apr 10, 2026

Apr 10, 2026·also NYU

VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images

VLMs can get a 10% boost in spatial reasoning and 3D understanding by training on just 10,000 synthetic images generated automatically from task keywords.

Yida Yin, Xingyu Fu, Zhuang Liu

Computer Vision Data Curation & Synthetic Data Multimodal Models

Apr 6, 2026

Vero: An Open RL Recipe for General Visual Reasoning

Open-sourcing Vero, a VLM trained with RL on a diverse 600K-sample dataset, closes the performance gap with proprietary models and reveals that broad task coverage, not just scale, is the key to unlocking general visual reasoning.

Linrong Cai, Qunzhong Wang, Haoyang Wu +1

Multimodal Models Open-Source Models & Weights Reasoning & Chain-of-Thought

Search

Zhuang Liu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (6)