Hui Zeng

Southwest University of Science and Technology, Mianyang, China {D23090100052; sanshuaicui}@cityu.edu.mo; zengh5@mail2.sysu.edu.cn * Corresponding author Abstract Existing neural network calibration methods often treat calibration as a static, post-hoc optimization task. However, this neglects the dynamic and temporal nature of real-world inference. Moreover, existing methods do not provide an intuitive interface enabling human operators to dynamically adjust model behavior under shifting conditions. In this work, we propose Knob, a framework that connects deep learning with classical control theory by mapping neural gating dynamics to a second-order mechanical system. By establishing correspondences between physical parameters—damping ratio (ζ\zeta) and natural frequency (ωn\omega_{n})—and neural gating, we create a tunable “safety valve.” The core mechanism employs a logit-level convex fusion, functioning as an input-adaptive temperature scaling. It tends to reduce model confidence particularly when model branches produce conflicting predictions. Furthermore, by imposing second-order dynamics (Knob-ODE), we enable a dual-mode inference: standard i.i.d. processing for static tasks, and state-preserving processing for continuous streams. Our framework allows operators to tune “stability” and “sensitivity” through familiar physical analogues. This paper presents an exploratory architectural interface; we focus on demonstrating the concept and validating its control-theoretic properties rather than claiming state-of-the-art calibration performance. Experiments on CIFAR-10-C validate the calibration mechanism and demonstrate that, in Continuous Mode, the gate responses are consistent with standard second-order control signatures (step settling and low-pass attenuation), paving the way for predictable human-in-the-loop tuning. 1 Introduction Deep neural networks have achieved remarkable success across various domains. However, their “black box” nature remains problematic, especially regarding their ability to reliably estimate uncertainty. In safety-critical applications like autonomous driving or medical diagnosis, a model’s ability to express uncertainty—and correct for overconfidence—is as crucial as its accuracy. The well-documented robustness–calibration paradox (Guo et al., 2017; Ovadia et al., 2019), where models become less calibrated as they become more robust, is a critical challenge (Roschewitz et al., 2025). Models that cannot faithfully represent their uncertainty under novel conditions are inherently unreliable. Existing solutions to this problem typically fall into two categories: post-hoc methods and training-time regularization. Post-hoc techniques like Temperature Scaling (TS) (Guo et al., 2017) adjust confidence scores using a validation set, effectively fitting a single scalar to a dataset that may not resemble the test environment. Training-time methods, such as label smoothing (Liu et al., 2022) or Mixup (Noh and others, 2023), attempt to bake calibration into the weights. However, both approaches suffer from a fundamental limitation: opacity. They produce static solutions that are difficult for an end-user to interpret or adjust. If a deployed model behaves erratically, the operator has no intuitive lever to pull; the only recourse is often to retrain or recalibrate on new data, which is not always feasible. We propose a different paradigm: viewing neural network inference not as a static mapping, but as a dynamic system with a controllable “Volume Knob.” Just as an audio engineer adjusts compressors to manage volume spikes or a driver tunes suspension settings for different terrains, an AI operator should have intuitive means to control a model’s “conservativeness” or “responsiveness.” We argue that the language of classical mechanics—damping, stiffness, and inertia—provides the ideal vocabulary for this interface. A more promising direction lies in architectural design. The superior calibration of architectures like Vision Transformers (ViT) (Minderer et al., 2021) suggests that calibration can be an intrinsic property of the model itself. However, such properties often emerge without a clear theoretical framework or interpretable control mechanisms. Drawing inspiration from the principled stability criteria in classical control theory (Franklin et al., 2019), we propose embedding the dynamics of physical systems directly into neural network architectures. Our primary contribution is Knob, a physics-grounded architectural framework that establishes a formal, differentiable mapping between the parameters of a classical second-order damped mechanical system (e.g., its damping ratio ζ\zeta and natural frequency ωn\omega_{n}) and the inference-time dynamics of a neural network. This mapping is enabled by a novel, physics-inspired neural layer that governs a gate’s response with interpretable control over its stability and speed, while Tustin discretization ensures step-size-independent stability. This gate performs a logit-level convex fusion, which we interpret as an input-adaptive temperature scaling mechanism that tends to suppress overconfidence. We instantiate this framework in a family of efficient methods (Knob) and illustrate our claims with a cohesive theory-evidence loop: the confidence moderation properties of convex fusion are examined in E-1, the gate’s learned behavior in E-2, and the second-order dynamics in E-3. Scope of this work. This paper is exploratory in nature: we focus on introducing the architectural interface and demonstrating that the control-theoretic properties (damping, frequency response) are indeed functional. We do not claim state-of-the-art calibration; rather, we aim to open a new design axis where model behavior can be adjusted via physically meaningful parameters. 2 Related Work The challenge of ensuring that neural network confidence scores accurately reflect true prediction correctness, particularly under distribution shift, has spurred extensive research. The field has predominantly advanced along two main trajectories: post-hoc calibration and training-time regularization. Post-hoc Calibration. Post-hoc methods, such as the seminal Temperature Scaling (TS) (Guo et al., 2017), its input-adaptive extensions (Mozafari et al., 2018; Joy and others, 2023), and more complex density-based approaches like Dirichlet calibration (Kull et al., 2019) or Bayesian Binning into Quantiles (BBQ) (Naeini et al., 2015), adjust the outputs of a pre-trained model. Although computationally inexpensive, their dependency on representative validation sets renders them vulnerable to performance deterioration under non-stationary real-world data streams (Ovadia et al., 2019). Training-time Regularization. Training-time regularization techniques, including label smoothing (Liu et al., 2022) and data augmentation strategies like Mixup (Noh and others, 2023), aim to build in calibration from the outset by discouraging overconfident predictions during optimization. These methods can improve in-distribution calibration but often require careful hyperparameter tuning and can increase training costs, limiting their agility. Architectural Design for Calibration. Our work carves a third path: architectural design for calibration. While some studies have shown that certain architectures like Vision Transformers (ViT) inherently offer better calibration (Minderer et al., 2021), many such findings are empirical observations rather than the result of principled design. We distinguish our approach by explicitly importing concepts from classical control theory (Franklin et al., 2019). Instead of relying on emergent properties, we engineer the network’s gating dynamics to follow a prescribed second-order damped system, providing an interpretable, physics-grounded mechanism for confidence modulation that is intrinsic to the model’s structure. Physics-Informed Deep Learning. Integrating physical priors into deep learning typically involves solving partial differential equations (PDEs) or modeling physical systems. Neural ODEs (Chen et al., 2018) bridged the gap between residual networks and dynamical systems. However, most prior work uses AI to solve physics. In contrast, we use physics to constrain AI. By imposing second-order damping dynamics on the gating mechanism, we leverage the stability guarantees of classical mechanics to regulate the volatile confidence estimates of deep networks. Second-order Neural ODEs and Momentum Dynamics. Several works explore second-order dynamics in neural ODEs (e.g., heavy-ball style formulations) primarily to improve training efficiency or feature evolution. Knob differs in intent: we use a second-order prior to regulate inference-time gating and expose (ζ,ωn)(\zeta,\omega_{n}) as user-interpretable control surfaces for run-time behavior shaping, rather than optimizing the training trajectory. Oscillatory and Damping-based Priors. Damped harmonic oscillator models are widely used in control theory to describe stable responses under noise and perturbations. In our setting, we adopt the same control primitives (damping ratio and bandwidth) but apply them to a neural gate, allowing us to probe step and frequency responses (E-3) as diagnostic signatures of the embedded prior. Comparison with MoE and Ensembling. Our approach shares similarities with Mixture of Experts (MoE) (Shazeer et al., 2017) and Deep Ensembles (Lakshminarayanan et al., 2017). However, standard MoE gates are typically trained to maximize capacity or sparsity and are theoretically unconstrained. Deep Ensembles rely on averaging independent models, which is computationally expensive and lacks run-time adjustability. Knob differs by explicitly constraining the gating signal to follow a differential equation, ensuring that the mixing weights evolve smoothly and predictably according to user-intelligible parameters (ζ,ωn\zeta,\omega_{n}). 3 Method: The Knob Framework This section details the Knob framework, proceeding from its high-level structure to its theoretical underpinnings and dynamic properties. 3.1 Architectural Overview and Core Terminology The core of the Knob framework is a logit-level convex fusion mechanism that operates on two parallel logit streams, 𝒛static\bm{z}_{\mathrm{static}} and 𝒛dyn\bm{z}_{\mathrm{dyn}}, supplied by a dual-stream backbone (Fig. 1). A sample-wise, learned scalar gate g(x)∈[0,1]g(x)\in[0,1] computes a convex combination of these two streams: 𝒛fuse=g(x)𝒛dyn+(1−g(x))𝒛static.\bm{z}_{\mathrm{fuse}}\;=\;g(x)\,\bm{z}_{\mathrm{dyn}}\;+\;\bigl(1-g(x)\bigr)\,\bm{z}_{\mathrm{static}}. (1) This architecture ensures that the fused logits 𝒛fuse\bm{z}_{\mathrm{fuse}} lie on the line segment connecting the two source logits, a property that intrinsically limits overconfidence, as we will formalize in §3.3. To provide stable and interpretable control over the gate’s behavior, its dynamics are governed by a differentiable second-order system, detailed in §3.2. Figure 1: The Knob framework as a physics-inspired interface. Left (Dual-Stream Backbone): A shared encoder feeds two lightweight projection heads, producing a robust “Static” branch and a more sensitive “Dynamic” branch, yielding complementary logit vectors 𝒛static\bm{z}_{\mathrm{static}} and 𝒛dyn\bm{z}_{\mathrm{dyn}}. Center (Physics Engine): The gating mechanism is modeled as a mass-spring-damper system (normalized to unit mass). A network-predicted reference input u∗(x)u^{*}(x) drives the system; its response is governed by two interpretable control parameters—Natural Frequency (ωn\omega_{n}), controlling sensitivity/bandwidth, and Damping Ratio (ζ\zeta), controlling stability/conservativeness—which act as tunable “knobs” for the operator. Right (Convex Fusion): The gate value g(x)∈[0,1]g(x)\in[0,1], determined by the mass displacement, performs a convex combination of two logit branches, naturally limiting model overconfidence. For clarity, we systematically define the family of methods evaluated in this paper: • Static-only (Static): Uses only the static branch logits as baseline. • Channel-attention fusion (Attention): A non-convex counterpart using standard attention. • Input-adaptive convex gate (Knob-IA): The core proposal using the learned gate g(x)g(x) for fusion. • EMA-smoothed gate (ODE-Lite): A lightweight version using first-order Exponential Moving Average. • Second-order damped gate (Knob-ODE): The full physics-inspired dynamics. 3.2 The Second-Order Damped Gate: Physical Control Parameters To prevent the learned gate g(x)g(x) from introducing instability, we constrain its dynamics by modeling its evolution as a classical second-order damped system. This approach injects a strong, interpretable physical prior into the model’s inference-time behavior, drawing inspiration from neural differential equations (Chen et al., 2018). From physical equations to a state-space representation. Instead of directly modeling the gate value g∈[0,1]g\in[0,1], we impose dynamics on a latent variable uu and obtain the gate via a sigmoid mapping, g=σ(u)g=\sigma(u). The evolution of uu follows a standard second-order ODE: u¨+2ζωnu˙+ωn2u=ωn2u∗(x),\ddot{u}+2\zeta\omega_{n}\dot{u}+\omega_{n}^{2}u\;=\;\omega_{n}^{2}\,u^{*}(x), (2) where u∗(x)u^{*}(x) is a network-predicted “target displacement,” and ζ\zeta (damping ratio) and ωn\omega_{n} (natural frequency) are learnable physical parameters governing the system’s response. Decoupling content from dynamics. A key benefit of Eq. (2) is that it separates what the gate wants to do from how it gets there. The reference input u∗(x)u^{*}(x) (learned from data) specifies the desired operating point for the gate, while the physical parameters (ζ,ωn)(\zeta,\omega_{n}) explicitly shape the transition trajectory (e.g., overshoot, settling speed, and bandwidth). This makes the inference-time behavior tunable through interpretable control knobs, unlike standard gating mechanisms whose dynamics are implicitly entangled in weight matrices. The key innovation is the re-interpretation of these coefficients as user-facing control parameters: • ζ\zeta (Damping Ratio) →\to Conservativeness/Stability: This parameter controls the system’s resistance to change. A high ζ\zeta (overdamped) corresponds to a conservative policy where the gate changes position slowly and deliberately, avoiding “knee-jerk” reactions to noisy inputs. A low ζ\zeta (underdamped) allows for faster reaction but risks “ringing” or instability. • ωn\omega_{n} (Natural Frequency) →\to Sensitivity/Bandwidth: This controls how fast the system can respond. A low ωn\omega_{n} acts as a low-pass filter, ignoring high-frequency input noise (or adversarial perturbations) and focusing on the steady-state signal. A high ωn\omega_{n} makes the system highly sensitive to every input fluctuation. For efficient neural implementation, we convert this to a first-order state-space representation with state vector 𝒙=[u,u˙]⊤\bm{x}=[u,\dot{u}]^{\top}: 𝒙˙=A𝒙+Bu∗(x),A=[01−ωn2−2ζωn],B=[0ωn2].\dot{\bm{x}}\;=\;A\,\bm{x}\;+\;B\,u^{*}(x),\qquad A\;=\;\begin{bmatrix}0&1\\ -\omega_{n}^{2}&-2\zeta\omega_{n}\end{bmatrix},\qquad B\;=\;\begin{bmatrix}0\\ \omega_{n}^{2}\end{bmatrix}. (3) The matrix AA governs the system’s internal dynamics, while BB defines how the input u∗(x)u^{*}(x) influences the state. Tustin discretization and the forward pass. For discrete-time neural computation, we use the Tustin (bilinear) transform (Smith, 1997) to discretize the continuous system. This method preserves stability and maps the continuous matrices (A,B)(A,B) to their discrete counterparts (Ad,Bd)(A_{d},B_{d}): Ad=(I−Δt2A)−1(I+Δt2A),Bd=(I−Δt2A)−1ΔtB.A_{d}\;=\;\bigl(I-\tfrac{\Delta t}{2}A\bigr)^{-1}\!\bigl(I+\tfrac{\Delta t}{2}A\bigr),\qquad B_{d}\;=\;\bigl(I-\tfrac{\Delta t}{2}A\bigr)^{-1}\Delta t\,B. (4) Proposition 1 (Step-size–independent stability of Tustin discretization). If the continuous-time system parameters satisfy ζ>0\zeta>0 and ωn>0\omega_{n}>0, then the eigenvalues of the discretized matrix AdA_{d} obtained via the Tustin transform obey |λi|<1|\lambda_{i}|<1 for all Δt>0\Delta t>0. Hence the resulting discrete-time system is stable independently of the step size (see Appendix D for proof). The forward pass for Knob-ODE is governed by the linear recursion: 𝒙t=Ad𝒙t−1+Bdut∗\bm{x}_{t}=A_{d}\bm{x}_{t-1}+B_{d}u_{t}^{*}. The final gate value is gt=σ(𝒙t[0])g_{t}=\sigma\!\bigl(\bm{x}_{t}[0]\bigr). Dual-Mode Inference: Static vs. Continuous. Crucially, our framework supports two inference modes depending on the application context: 1. Reset Mode (I.I.D. Tasks): For standard benchmarks like ImageNet or CIFAR, where samples are independent, we reset the internal state 𝒙\bm{x} to zero for each new input (𝒙0=𝟎\bm{x}_{0}=\bm{0}). In this mode, the mechanism acts as a single-step, input-adaptive gate without temporal memory. 2. Continuous Mode (Stream Processing): For time-series data, video streams, or our dynamic probe experiments (E-3), the state 𝒙t\bm{x}_{t} is preserved across time steps. This allows the “physical inertia” of the gate to filter out high-frequency noise and enforce temporal consistency, strictly adhering to the specified damping dynamics. Unless otherwise stated, quantitative metrics (Table 2) use Reset Mode, while dynamic analyses (Fig. 5) use Continuous Mode. In our implementation, we use a default time step of Δt=1\Delta t=1. What ODE priors do in Reset Mode. In Reset Mode, the state is initialized as 𝒙0=𝟎\bm{x}_{0}=\bm{0} and we apply a single update per sample. Even without temporal memory, the discretized dynamics still induce a parametric shrinkage of the raw command u∗(x)u^{*}(x) (and thus of gg), acting as a lightweight, physically-motivated regularizer. In Continuous Mode, preserving 𝒙t\bm{x}_{t} additionally yields genuine temporal smoothing. Contrast to standard neural gates. Unlike common gates (e.g., sigmoid gates in recurrent units) whose dynamics are implicit in learned weights, Knob parameterizes gate dynamics explicitly with a second-order template and stabilizes discretization via Tustin. This yields controllable response regimes (over/critical/under-damped) and interpretable bandwidth through (ζ,ωn)(\zeta,\omega_{n}). 3.3 Interpretation of Convex Fusion The logit-level convex fusion (Eq. 1) provides a geometric mechanism for confidence moderation. We formalize this behavior under explicit Top-1/Top-2 conditions. Proposition 2 (Convex Fusion Contracts the Top-2 Margin). Let 𝐳1,𝐳2∈ℝK\bm{z}_{1},\bm{z}_{2}\in\mathbb{R}^{K} and g∈[0,1]g\in[0,1]. Assume the two branches agree on the predicted class k

Papers on Lattice

Total citations

Topics

h-index