Feb 12, 2026arXiv:2602.11824

Revis: Sparse Latent Steering to Mitigate Object Hallucination in Large Vision-Language Models

AI Summary

The paper introduces REVIS, a training-free framework to mitigate object hallucination in LVLMs by re-activating suppressed visual information in deeper layers. REVIS leverages latent space geometry to extract a pure visual information vector via orthogonal projection and then applies a calibrated, sparse intervention strategy at specific network depths. Experiments on standard benchmarks show REVIS reduces object hallucination by ~19% compared to SOTA baselines, while maintaining general reasoning abilities.

Key Contribution

A surprisingly simple, training-free intervention in latent space cuts object hallucination in LVLMs by nearly 20% without sacrificing reasoning.

Abstract

Despite the advanced capabilities of Large Vision-Language Models (LVLMs), they frequently suffer from object hallucination. One reason is that visual features and pretrained textual representations often become intertwined in the deeper network layers. To address this, we propose REVIS, a training-free framework designed to explicitly re-activate this suppressed visual information. Rooted in latent space geometry, REVIS extracts the pure visual information vector via orthogonal projection and employs a calibrated strategy to perform sparse intervention only at the precise depth where suppression occurs. This surgical approach effectively restores visual information with minimal computational cost. Empirical evaluations on standard benchmarks demonstrate that REVIS reduces object hallucination rates by approximately 19% compared to state-of-the-art baselines, while preserving general reasoning capabilities.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References26

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Revis: Sparse Latent Steering to Mitigate Object Hallucination in Large Vision-Language Models

Related Papers