Feb 23, 2026arXiv:2602.19528

Beyond Accuracy: A Unified Random Matrix Theory Diagnostic Framework for Crash Classification Models

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

AI Summary

This paper introduces a Random Matrix Theory (RMT) based spectral diagnostic framework to detect overfitting in crash classification models across diverse ML architectures. The framework leverages Heavy-Tailed Self-Regularization (HTSR) to analyze weight matrices, out-of-fold increment matrices, empirical Hessians, induced affinity matrices, and Graph Laplacians. Experiments on two Iowa DOT crash classification datasets demonstrate that the power-law exponent $α$ derived from spectral analysis correlates with model quality and expert agreement, enabling early stopping and spectral model selection.

Key Contribution

Forget accuracy: a spectral "fingerprint" derived from Random Matrix Theory can reveal hidden overfitting in your model, even when standard metrics look good.

Abstract

Crash classification models in transportation safety are typically evaluated using accuracy, F1, or AUC, metrics that cannot reveal whether a model is silently overfitting. We introduce a spectral diagnostic framework grounded in Random Matrix Theory (RMT) and Heavy-Tailed Self-Regularization (HTSR) that spans the ML taxonomy: weight matrices for BERT/ALBERT/Qwen2.5, out-of-fold increment matrices for XGBoost/Random Forest, empirical Hessians for Logistic Regression, induced affinity matrices for Decision Trees, and Graph Laplacians for KNN. Evaluating nine model families on two Iowa DOT crash classification tasks (173,512 and 371,062 records respectively), we find that the power-law exponent $α$ provides a structural quality signal: well-regularized models consistently yield $α$ within $[2, 4]$ (mean $2.87 \pm 0.34$), while overfit variants show $α< 2$ or spectral collapse. We observe a strong rank correlation between $α$ and expert agreement (Spearman $ρ= 0.89$, $p < 0.001$), suggesting spectral quality captures model behaviors aligned with expert reasoning. We propose an $α$-based early stopping criterion and a spectral model selection protocol, and validate both against cross-validated F1 baselines. Sparse Lanczos approximations make the framework scalable to large datasets.

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Beyond Accuracy: A Unified Random Matrix Theory Diagnostic Framework for Crash Classification Models

Related Papers