Juanzi Li

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Tool Use & Agents (3)RLHF & Preference Learning (3)Computer Vision (2)Eval Frameworks & Benchmarks (2)

Frequent co-authors

Xiaozhi Wang (3)Chuangxin Zhao (2)Yijian LU (2)Ji Qi (2)

Papers (8)

Jun 24, 2026

Chuangxin Zhao +94d ago·also ECNU

HG-Bench: A Benchmark for Multi-Page Handwritten Answer-Region Grounding in Automated Homework Assessment

No existing model can effectively ground the spatial structure of student reasoning in multi-page handwritten homework, revealing a significant gap in automated assessment capabilities.

Chuangxin Zhao, Boyan Shi, Yanling Wang +7

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Jun 23, 2026

Tsinghua AI5d ago·also CAS

An LMM for Precisely Grounding Elements in Documents

PreciseDoc achieves unprecedented precision in grounding critical document elements, transforming how LMMs can interpret complex text-rich environments.

Yijian LU, Chuangxin Zhao, Kai Sun +3

Computer Vision Multimodal Models

Jun 16, 2026

Tsinghua AI1w ago·also AI Laboratory, SEU

EnvRL: Learn from Environment Dynamics in Agentic Reinforcement Learning

By harnessing implicit supervision from environment dynamics, EnvRL boosts RL success rates by over 4% on long-horizon tasks, revealing a new frontier in agentic learning.

Zhitong Wang, Songze Li, Hao Peng +4

Tool Use & Agents World Models & Planning

Jun 11, 2026

Tsinghua AI2w ago

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

Environment engineering, not just agent workflows, is the key to unlocking the full potential of autonomous scientific discovery, as demonstrated by EurekAgent's record-breaking results.

Amy Xin, Amy Xin, Jiening Siow +12

Scientific Discovery & Drug Design Tool Use & Agents

Jun 3, 2026

Tsinghua AI3w ago·also HIT, XJTU

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

Reward hacking in rubric-based RL is not just common; it can be systematically reproduced and analyzed using the new CHERRL environment, revealing hidden biases that could compromise training integrity.

Xuekang Wang, Zhuoyuan Hao, Shuo Hou +3

Constitutional AI & AI Ethics RLHF & Preference Learning

May 29, 2026

Tsinghua AIMay 29, 2026

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

LLMs can be taught to reason more comprehensively over long contexts by rewarding not just the final answer, but also the quality of the reasoning steps taken to arrive at that answer.

Nianyi Lin, Jiajie Zhang, Lei Hou +1

Reasoning & Chain-of-Thought Recommendation & Information Retrieval Tool Use & Agents

May 26, 2026

Tsinghua AIMay 26, 2026

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders

Forget external signals – unlock better LLM post-training by mining model internals with sparse autoencoders to reveal data diversity, difficulty, and quality.

Yi Jing, Zao Dai, Jinwu Hu +4

Data Curation & Synthetic Data Interpretability & Mechanistic Interp RLHF & Preference Learning

May 6, 2026

Haotian Xia +5May 6, 2026·also Tsinghua AI, Airbnb

StoryAlign: Evaluating and Training Reward Models for Story Generation

Current reward models are surprisingly bad at judging story quality, achieving only 66% accuracy in selecting human-preferred narratives – a gap closed by a new, purpose-built reward model.

Haotian Xia, Yunjia Qi, Xiaozhi Wang +3

Eval Frameworks & Benchmarks Natural Language Processing RLHF & Preference Learning