Search papers, labs, and topics across Lattice.
This paper introduces a deterministic formulation of dropout as an additive regularizer integrated into the training loss of Transformer architectures, eliminating the need for stochastic masking. By deriving explicit regularization terms for various components of Transformers, including attention and feed-forward layers, the authors provide a method for fine-grained control over regularization strength. Experiments across multiple domains demonstrate that this explicit dropout approach matches or exceeds the performance of traditional stochastic methods, highlighting its potential for improved model stability and interpretability.
Explicit dropout achieves superior performance without the randomness of traditional methods, offering a clearer path to regularization control in Transformer models.
Dropout is a widely used regularization technique in deep learning, but its effects are typically realized through stochastic masking rather than explicit optimization objectives. We propose a deterministic formulation that expresses dropout as an additive regularizer directly incorporated into the training loss. The framework derives explicit regularization terms for Transformer architectures, covering attention query, key, value, and feed-forward components with independently controllable strengths. This formulation removes reliance on stochastic perturbations while providing clearer and fine-grained control over regularization strength. Experiments across image classification, temporal action detection, and audio classification show that explicit dropout matches or outperforms conventional implicit methods, with consistent gains when applied to attention and feed-forward network layers. Ablation studies demonstrate stable performance and controllable regularization through regularization coefficients and dropout rates. Overall, explicit dropout offers a practical and interpretable alternative to stochastic regularization while maintaining architectural flexibility across diverse tasks.