Kaifu Zhang

Papers on Lattice

Total citations

Topics

Research focus

Multimodal Models (2)Reasoning & Chain-of-Thought (1)Tool Use & Agents (1)Computer Vision (1)RLHF & Preference Learning (1)

Frequent co-authors

Wenhao Yang (1)Wenhao Yang (1)Jinlong Huang (1)Shiyin Lu (1)

Papers (2)

Apr 8, 2026

Stanford HAIApr 8, 2026·also University of Science and Technology

Walk the Talk: Bridging the Reasoning-Action Gap for Thinking with Images via Multimodal Agentic Policy Optimization

MLLMs can "think" with images, but their actions often don't match their reasoning, and this paper solves that with a new training method that forces them to explain what they see.

Wenhao Yang, Wenhao Yang, Jinlong Huang +9

Multimodal Models Reasoning & Chain-of-Thought Tool Use & Agents

Mar 2, 2026

Haonan Jia +4Mar 2, 2026

Cross-modal Identity Mapping: Minimizing Information Loss in Modality Conversion via Reinforcement Learning

Get better image captions without more data: reinforcement learning can train vision-language models to focus on image details by maximizing the similarity between images retrieved using the generated captions.

Haonan Jia, Shichao Dong, Zenghui Sun +2

Computer Vision Multimodal Models RLHF & Preference Learning

Search

Kaifu Zhang

Research focus

Frequent co-authors

Papers (2)