Tsinghua AIBeijing Key Laboratory of EmbodiedJun 3, 2026arXiv:2606.04767

Measuring Model Robustness via Fisher Information: Spectral Bounds, Theoretical Guarantees, and Practical Algorithms

Chong Zhang, Xiang Li, Jia Wang, Qiufeng Wang, Xiaobo Jin

AI Summary

This paper introduces a novel, attack-agnostic robustness metric for deep neural networks based on the spectral norm of the Fisher Information Matrix (FIM), which quantifies model sensitivity to input perturbations. The authors derive theoretical bounds for various architectures, including VGG and ResNet, establishing a framework for robustness ranking that correlates strongly with adversarial vulnerability. Their efficient algorithms enable scalable evaluation in both white-box and black-box settings, making it a valuable diagnostic tool for enhancing model robustness in safety-critical applications.

Key Contribution

A new robustness metric based on Fisher Information reveals critical insights into model sensitivity and adversarial vulnerability across popular architectures.

Abstract

The robustness of deep neural networks is crucial for safety-critical deployments, yet existing evaluation methods are often attack-dependent and lack interpretability. We propose a principled, attack-agnostic robustness metric based on the spectral norm of the Fisher Information Matrix (FIM), which quantifies the worst-case sensitivity of the model's output distribution to input perturbations. Theoretically, we establish that the FIM equals the variance of the input Jacobian and derive closed-form spectral bounds for common architectures, including VGG, ResNet, DenseNet, and Transformer, providing the first theoretical robustness ranking. To enable scalable evaluation, we develop efficient algorithms, including power iteration and Hutchinson-based estimation, that support both white-box and black-box settings. Extensive experiments across multiple datasets, including CIFAR, ImageNet, and medical images, and across multiple architectures show a strong correlation between our metric and adversarial vulnerability. Our framework serves as an interpretable diagnostic tool that complements attack-based evaluations, offering insights into architectural sensitivity and guiding the design of more robust models. Code is available at: https://github.com/franz-chang/SRP/.

Interpretability & Mechanistic Interp Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Measuring Model Robustness via Fisher Information: Spectral Bounds, Theoretical Guarantees, and Practical Algorithms

Related Papers