Apr 30, 2026arXiv:2604.27529

Adjoint Inversion Reveals Holographic Superposition and Destructive Interference in CNN Classifiers

AI Summary

The paper introduces a hallucination-free inversion framework for CNNs based on magnitude-phase decoupling and Local Adjoint Correctors, enabling accurate spatial gradient analysis. This framework reveals that CNN encoders exhibit holographic superposition, where individual channels contain both positive and negative weight reconstructions that are visually indistinguishable but cancel out to highlight the foreground. The study demonstrates that classification operates through destructive interference, directly challenging the Spatial Funnel Hypothesis and linking channel requirements to the volume of the admissible interference subspace.

Key Contribution

CNN classifiers don't just select from cleaned features, they actively cancel out shared background information via destructive interference, rewriting our understanding of how these networks actually "see".

Abstract

A foundational assumption in CNN interpretability -- that deep encoders suppress background pixels while classifiers merely select from a cleaned feature pool (the Spatial Funnel Hypothesis) -- remains untested due to spatial hallucinations in existing visualization tools. We address this by introducing a hallucination-free inversion framework built on magnitude-phase decoupling and Local Adjoint Correctors. Our method mathematically guarantees that the spatial gradient support of every reconstruction stems strictly from genuinely active channels. Using this framework as a geometric probe, we uncover the first pixel-level evidence of strong superposition in vision encoders. We show that per-channel inversions are uniformly holographic: positive and negative weight reconstructions are visually and energetically indistinguishable. However, their algebraic sum sharply concentrates on the foreground. This proves classification operates via destructive interference -- classifier weights cancel a shared background direction in pixel space and constructively assemble class-discriminative residuals, directly falsifying the Spatial Funnel Hypothesis. This interference model identifies the volume of the admissible interference subspace as the geometric quantity governing channel requirements. We prove this volume is dual to the GAP covariance determinant, yielding a covariance-volume channel selection algorithm with a $(1-1/e)$ approximation guarantee. This algorithm mathematically reveals out-of-distribution (OOD) failure as a measurable collapse of the covariance volume essential for interference-based classification. Our framework extends seamlessly to attention-based heads without retraining.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Interpretability & Mechanistic Interp

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Adjoint Inversion Reveals Holographic Superposition and Destructive Interference in CNN Classifiers

Related Papers