May 6, 2026arXiv:2605.04946

Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks

Xuan Qi, Yi Wei, Fanqi Yu, Furao shen, Vittorio Murino, Cigdem Beyan

AI Summary

This paper analyzes the impact of batch normalization (BN) during training on the geometry of piecewise-affine networks. It demonstrates that BN induces a batch-conditional recentering effect, where switching hyperplanes are parallel translates defined by batch-standardized coordinates. The authors provide sufficient conditions under which BN increases local partition refinement in ReLU networks, offering a geometric interpretation of BN's function-level effects.

Key Contribution

Batch normalization's power comes from reshaping the geometry of neural network decision boundaries on a per-batch basis, not just from optimization benefits.

Abstract

Batch normalization (BN) is central to modern deep networks, but its effect on the realized function during training remains less understood than its optimization benefits. We study training-time BN in continuous piecewise-affine (CPA) networks through the geometry of switching hyperplanes and the induced affine-region partition. Conditioned on a mini-batch, we show that BN defines for each neuron a reference hyperplane through the batch centroid, and that breakpoint-switching hyperplanes are parallel translates whose offsets are expressed in batch-standardized coordinates and are independent of the raw bias. This yields an exact criterion for when a switching hyperplane intersects a local $\ell_\infty$ window and motivates a local region-density functional based on exact affine-region counts. Under explicit sufficient conditions, we show that BN increases expected local partition refinement in ReLU and more general piecewise-affine networks, and that this mechanism transfers locally through depth inside parent affine regions where the upstream representation map is an affine embedding. These results provide a function-level geometric account of training-time BN as a batch-conditional recentering mechanism near the data.

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks

Related Papers