Apr 8, 2026arXiv:2604.06714

Steering the Verifiability of Multimodal AI Hallucinations

Jianhong Pang, Ruoxi Cheng, Ziyi Ye, Xingjun Ma, Xuanjing Huang

AI Summary

This paper investigates the verifiability of hallucinations in multimodal large language models (MLLMs), distinguishing between "obvious" and "elusive" hallucinations based on human detectability. They construct a dataset of 4,470 human responses to AI-generated hallucinations to categorize these types. By learning separate activation-space probes for each type, they demonstrate fine-grained control over the model's verifiability, enabling targeted interventions to regulate hallucination detectability.

Key Contribution

You can dial up or down how obvious an AI's hallucinations are, giving you control over whether users catch the errors.

Abstract

AI applications driven by multimodal large language models (MLLMs) are prone to hallucinations and pose considerable risks to human users. Crucially, such hallucinations are not equally problematic: some hallucination contents could be detected by human users(i.e., obvious hallucinations), while others are often missed or require more verification effort(i.e., elusive hallucinations). This indicates that multimodal AI hallucinations vary significantly in their verifiability. Yet, little research has explored how to control this property for AI applications with diverse security and usability demands. To address this gap, we construct a dataset from 4,470 human responses to AI-generated hallucinations and categorize these hallucinations into obvious and elusive types based on their verifiability by human users. Further, we propose an activation-space intervention method that learns separate probes for obvious and elusive hallucinations. We reveal that obvious and elusive hallucinations elicit different intervention probes, allowing for fine-grained control over the model's verifiability. Empirical results demonstrate the efficacy of this approach and show that targeted interventions yield superior performance in regulating corresponding verifiability. Moreover, simply mixing these interventions enables flexible control over the verifiability required for different scenarios.

Eval Frameworks & Benchmarks Multimodal Models Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References42

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Steering the Verifiability of Multimodal AI Hallucinations

Related Papers