Lijun Wu

Papers on Lattice

Total citations

902

Topics

h-index

Research focus

Multimodal Models (3)Computer Vision (2)Data Curation & Synthetic Data (2)Code Generation & Program Synthesis (1)Reasoning & Chain-of-Thought (1)

Frequent co-authors

Honglin Lin (3)Mengzhang Cai (3)Zheng Liu (2)Xiaoyang Wang (2)

Papers (6)

Apr 7, 2026

Juekai Lin +9Apr 7, 2026·also B Corresponding Author

Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning

An 8B model can now generate scientific graphics code that rivals or surpasses the output of much larger proprietary models, thanks to a new dataset, benchmark, and reinforcement learning approach.

Juekai Lin, Yun Zhu, Honglin Lin +7

Code Generation & Program Synthesis Multimodal Models Reasoning & Chain-of-Thought

Apr 6, 2026

Tianyao He +43Apr 6, 2026

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

Forget bigger models: massive gains in document parsing accuracy are still possible through smarter data engineering alone.

Tianyao He, Linke Ouyang, Fan Wu +41

Architecture Design (Transformers, SSMs, MoE)Computer Vision Data Curation & Synthetic Data

Mar 16, 2026

Tsinghua AIMar 16, 2026

Molecular Identifier Visual Prompt and Verifiable Reinforcement Learning for Chemical Reaction Diagram Parsing

Forget tedious fine-tuning: leveraging molecule identifiers as visual prompts unlocks surprisingly powerful zero-shot chemical reaction diagram parsing in VLMs.

Jiahe Song, Yinfan Wang, Rui Nie +2

Computer Vision Multimodal Models Scientific Discovery & Drug Design

Mar 7, 2026

Chuxue Cao +7Mar 7, 2026

Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

Forget scaling laws, targeted data engineering—specifically multi-stage distillation and difficulty-aware sampling—allows an 8B model to outperform larger open-source financial LLMs.

Chuxue Cao, Honglin Lin, Zhanping Zhong +5

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Inference & Quantization

Jan 20, 2026

Jan 20, 2026·also PKU, RUC, Shanghai AI Lab

ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch

Forget simplistic synthetic data: ChartVerse generates complex charts and reliable reasoning data from scratch, enabling an 8B model to outperform its 30B teacher in chart reasoning.

Zheng Liu, Honglin Lin, Chonghan Qin +13

Apr 14, 2025

Tsinghua AIApr 14, 2025·also NUS, CUHK, Deakin, Fudan +10

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Open-source multimodal models just leveled up: InternVL3 rivals closed-source titans like GPT-4o by pre-training vision and language together from the start.

Jinguo Zhu, Weiyun Wang, Zhe Chen +45901

Multimodal Models Open-Source Models & Weights Training Efficiency & Optimization

Search

Lijun Wu

Research focus

Frequent co-authors

Papers (6)