George Mason UniversityUDelawareApr 9, 2026arXiv:2604.08192

Inside-Out: Measuring Generalization in Vision Transformers Through Inner Workings

Yunxiang Peng, Mengmeng Ma, Ziyu Yao, Xi Peng

AI Summary

This paper introduces a novel approach to measuring generalization in Vision Transformers (ViTs) by analyzing their internal circuits, specifically causal interactions between internal representations. They derive two metrics from these circuits: Dependency Depth Bias (DDB) for pre-deployment model selection and Circuit Shift Score (CSS) for post-deployment performance monitoring under distribution shift. Experiments across various tasks demonstrate that DDB and CSS significantly outperform existing proxy metrics in correlating with generalization performance, achieving average improvements of 13.4% and 34.1%, respectively.

Key Contribution

Model internals, not just outputs, hold the key to predicting generalization: circuit-based metrics beat standard proxies by up to 34% in assessing ViT performance under distribution shift.

Abstract

Reliable generalization metrics are fundamental to the evaluation of machine learning models. Especially in high-stakes applications where labeled target data are scarce, evaluation of models' generalization performance under distribution shift is a pressing need. We focus on two practical scenarios: (1) Before deployment, how to select the best model for unlabeled target data? (2) After deployment, how to monitor model performance under distribution shift? The central need in both cases is a reliable and label-free proxy metric. Yet existing proxy metrics, such as model confidence or accuracy-on-the-line, are often unreliable as they only assess model output while ignoring the internal mechanisms that produce them. We address this limitation by introducing a new perspective: using the inner workings of a model, i.e., circuits, as a predictive metric of generalization performance. Leveraging circuit discovery, we extract the causal interactions between internal representations as a circuit, from which we derive two metrics tailored to the two practical scenarios. (1) Before deployment, we introduce Dependency Depth Bias, which measures different models' generalization capability on target data. (2) After deployment, we propose Circuit Shift Score, which predicts a model's generalization under different distribution shifts. Across various tasks, both metrics demonstrate significantly improved correlation with generalization performance, outperforming existing proxies by an average of 13.4\% and 34.1\%, respectively. Our code is available at https://github.com/deep-real/GenCircuit.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Interpretability & Mechanistic Interp

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Inside-Out: Measuring Generalization in Vision Transformers Through Inner Workings

Related Papers