ByteDanceMar 7, 2026arXiv:2603.07244

PresentBench: A Fine-Grained Rubric-Based Benchmark for Slide Generation

Xin-Sheng Chen, Jiayuan Zhu, Pei-lin Li, Han Wang, Shuojin Yang, Meng-Hao Guo

AI Summary

PresentBench is introduced as a fine-grained, rubric-based benchmark for evaluating automated slide generation, addressing the limitations of coarse-grained, holistic evaluations. The benchmark includes 238 instances with background materials and an average of 54.1 binary checklist items per instance to enable detailed, instance-specific assessment. Experiments demonstrate PresentBench's superior reliability and alignment with human preferences, revealing that NotebookLM significantly outperforms other slide generation methods.

Key Contribution

Stop guessing if your slide generation model is any good: PresentBench offers a fine-grained, rubric-based benchmark with human-aligned evaluations.

Abstract

Slides serve as a critical medium for conveying information in presentation-oriented scenarios such as academia, education, and business. Despite their importance, creating high-quality slide decks remains time-consuming and cognitively demanding. Recent advances in generative models, such as Nano Banana Pro, have made automated slide generation increasingly feasible. However, existing evaluations of slide generation are often coarse-grained and rely on holistic judgments, making it difficult to accurately assess model capabilities or track meaningful advances in the field. In practice, the lack of fine-grained, verifiable evaluation criteria poses a critical bottleneck for both research and real-world deployment. In this paper, we propose PresentBench, a fine-grained, rubric-based benchmark for evaluating automated real-world slide generation. It contains 238 evaluation instances, each supplemented with background materials required for slide creation. Moreover, we manually design an average of 54.1 checklist items per instance, each formulated as a binary question, to enable fine-grained, instance-specific evaluation of the generated slide decks. Extensive experiments show that PresentBench provides more reliable evaluation results than existing methods, and exhibits significantly stronger alignment with human preferences. Furthermore, our benchmark reveals that NotebookLM significantly outperforms other slide generation methods, highlighting substantial recent progress in this domain.

Eval Frameworks & Benchmarks Multimodal Models Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References52

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

PresentBench: A Fine-Grained Rubric-Based Benchmark for Slide Generation

Related Papers