Yueting Zhuang

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (6)Computer Vision (5)Tool Use & Agents (5)Reasoning & Chain-of-Thought (3)

Frequent co-authors

Yongliang Shen (9)Jun Xiao (8)Zhengxi Lu (5)Weiming Lu (5)

Papers (11)

Jul 1, 2026

Hongxing Li +131w ago

Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning

Decoupling perception from reasoning in visual tasks leads to a remarkable 93.2% accuracy on V-Star, showcasing a new paradigm for fine-grained visual reasoning.

Hongxing Li, Xiufeng Huang, Dingming Li +11

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Apr 15, 2026

Apr 15, 2026·also Kuaishou

UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization

Offloading memory and computation to a copilot lets a 7B parameter GUI agent outperform larger models on long-horizon tasks, suggesting a path to more efficient and capable GUI automation.

Zhengxi Lu, Fei Tang, Guangyi Liu +6

Multimodal Models Tool Use & Agents

Ding Li +17Apr 15, 2026·also ZJU

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

Forget noisy pseudo-labels: SpatialEvo unlocks self-supervised 3D spatial reasoning by generating perfectly accurate training data directly from scene geometry.

Ding Li, Dinging Li, Yingxiu Zhao +15

Computer Vision Robotics & Embodied AI World Models & Planning

Fei Tang +10Apr 15, 2026·also CAU, CLAI-LAB

UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding

Uncertainty-driven zoom-in boosts GUI grounding accuracy by up to 13.4% without any retraining, showing that targeted attention to model uncertainty can significantly improve performance.

Fei Tang, Bofan Chen, Zhengxi Lu +8

Computer Vision Multimodal Models Natural Language Processing

Apr 13, 2026

Fei Tang +6Apr 13, 2026

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Finally, a unified open-source framework lets you train, evaluate, and deploy GUI agents across real devices and chat platforms, closing the gap between research and real-world application.

Fei Tang, Zhiqiong Lu, Boxuan Zhang +4

Eval Frameworks & Benchmarks Tool Use & Agents

Apr 13, 2026·also NJU

LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation

Object-centric vision could be the key to unlocking LMMs' potential for precise object manipulation and fine-grained spatial reasoning, capabilities currently beyond their reach.

Wenqiao Zhang, Juekai Lin, Yu Zhong +5

Computer Vision Multimodal Models Tool Use & Agents

Apr 9, 2026

Haolei Xu +8Apr 9, 2026

Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts

Multimodal models can "see" the image but still fail at reasoning because the visual input distracts the routing mechanism from activating the right experts.

Haolei Xu, Haiwen Hong, Rui Zhou +6

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Reasoning & Chain-of-Thought

Apr 9, 2026

KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

Even frontier models like Claude Sonnet 4.6 stumble when asked to infer user preferences and proactively assist in mobile tasks, achieving less than 50% success despite excelling at explicit task execution.

Tongbo Chen, Tong-I Chen, Zhengxi Lu +16

Eval Frameworks & Benchmarks Tool Use & Agents

Apr 2, 2026

Zhengxi Lu +12Apr 2, 2026·also Tsinghua AI, BNRist, Department of Automation, People's Daily Online +1

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

LLM agents can internalize skills via in-context RL, achieving zero-shot autonomous behavior without the token overhead and retrieval noise of traditional methods.

Zhengxi Lu, Zhiyuan Yao, Zhiyuan Yao +10

RLHF & Preference Learning Tool Use & Agents Training Efficiency & Optimization

Mar 30, 2026

Corresponding authorMar 30, 2026·also ZJU

LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization

Current multimodal systems struggle with logical flow in visual sequences because they neglect visual logic, but LogiStory tackles this head-on, turning narrative coherence into an explicit objective.

Chutian Meng, Fan Ma, Jiaxu Miao +1

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Mar 16, 2026

Aozhe Wang +5Mar 16, 2026

Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

By adversarially co-evolving code and test LLMs, Code-A1 achieves code generation performance on par with human-annotated training, while simultaneously boosting the LLM's ability to find bugs.

Aozhe Wang, Nan Zhou, Zhengxi Lu +3

Code Generation & Program Synthesis Red-Teaming & Adversarial Robustness RLHF & Preference Learning

Search

Yueting Zhuang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (11)