CASFeb 26, 2026arXiv:2602.22839

DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

Haolin Zheng, Hao Zheng, Guozhao Mo, Guozhao Mo, Xinru Yan, Xinru Yan, Qianhao Yuan, Qianhao Yuan, Wenkai Zhang, Wenkai Zhang, Xuanang Chen, Xuanang Chen, Yaojie Lu, Yaojie Lu, Hongyu Lin, Hongyu Lin, Xianpei Han, Xianpei Han, Le Sun, Le Sun

AI Summary

DeepPresenter is an agentic framework for presentation generation that autonomously plans, renders, and revises slides based on user intent and environmental observations. It uses environment-grounded reflection, conditioning the generation process on perceptual artifact states (rendered slides) to identify and correct presentation-specific issues. Experiments demonstrate state-of-the-art performance on diverse presentation generation scenarios, with a fine-tuned 9B model achieving competitive results at a lower cost.

Key Contribution

By grounding reflection in the visual artifacts of presentation slides, DeepPresenter enables agents to iteratively refine presentations in a way that internal reasoning traces alone cannot.

Abstract

Presentation generation requires deep content research, coherent visual design, and iterative refinement based on observation. However, existing presentation agents often rely on predefined workflows and fixed templates. To address this, we present DeepPresenter, an agentic framework that adapts to diverse user intents, enables effective feedback-driven refinement, and generalizes beyond a scripted pipeline. Specifically, DeepPresenter autonomously plans, renders, and revises intermediate slide artifacts to support long-horizon refinement with environmental observations. Furthermore, rather than relying on self-reflection over internal signals (e.g., reasoning traces), our environment-grounded reflection conditions the generation process on perceptual artifact states (e.g., rendered slides), enabling the system to identify and correct presentation-specific issues during execution. Results on the evaluation set covering diverse presentation-generation scenarios show that DeepPresenter achieves state-of-the-art performance, and the fine-tuned 9B model remains highly competitive at substantially lower cost. Our project is available at: https://github.com/icip-cas/PPTAgent

Multimodal Models Tool Use & Agents World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

Related Papers