BUPTApr 7, 2026arXiv:2604.05623

DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions

Xinran Wang, Xiao Zhang, Haolong Yan, Muxi Diao, Songyu Xu, Zhonghao Yan, Hongbing Li, Kongming Liang, Zhanyu Ma

AI Summary

DetailVerifyBench is introduced as a new benchmark to evaluate the ability of MLLMs to localize hallucinations within long image captions. The benchmark consists of 1,000 images across five domains, each paired with a detailed caption averaging over 200 words, and features token-level annotations indicating various types of hallucinations. Experiments using DetailVerifyBench reveal that existing MLLMs struggle to precisely identify and localize hallucinations in long captions, highlighting a significant gap in current capabilities.

Key Contribution

Current MLLMs can't find the lies hidden in their long image captions, struggling to pinpoint specific hallucinated words within detailed narratives.

Abstract

Accurately detecting and localizing hallucinations is a critical task for ensuring high reliability of image captions. In the era of Multimodal Large Language Models (MLLMs), captions have evolved from brief sentences into comprehensive narratives, often spanning hundreds of words. This shift exponentially increases the challenge: models must now pinpoint specific erroneous spans or words within extensive contexts, rather than merely flag response-level inconsistencies. However, existing benchmarks lack the fine granularity and domain diversity required to evaluate this capability. To bridge this gap, we introduce DetailVerifyBench, a rigorous benchmark comprising 1,000 high-quality images across five distinct domains. With an average caption length of over 200 words and dense, token-level annotations of multiple hallucination types, it stands as the most challenging benchmark for precise hallucination localization in the field of long image captioning to date. Our benchmark is available at https://zyx-hhnkh.github.io/DetailVerifyBench/.

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References31

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions

Related Papers