CausalFoundationModels.ELLISFreiburgFeb 5, 2025arXiv:2502.03654

Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics

Indrashis Das, Mahmoud Safari, Steven Adriaensen, Frank Hutter

AI Summary

The paper introduces the Gompertz Linear Unit (GoLU), a novel self-gated activation function defined as $x \cdot \text{Gompertz}(x)$ where $\text{Gompertz}(x) = e^{-e^{-x}}$. GoLU leverages the right-skewed asymmetry of the Gompertz function to more effectively reduce variance in the latent space compared to GELU and Swish, while maintaining robust gradient flow. Empirical evaluations across image classification, language modeling, and other tasks demonstrate that GoLU achieves superior performance compared to state-of-the-art activation functions.

Key Contribution

A new activation function, GoLU, leverages the Gompertz function's asymmetry to outperform GELU and Swish across diverse deep learning tasks.

Abstract

Activation functions are fundamental elements of deep learning architectures as they significantly influence training dynamics. ReLU, while widely used, is prone to the dying neuron problem, which has been mitigated by variants such as LeakyReLU, PReLU, and ELU that better handle negative neuron outputs. Recently, self-gated activations like GELU and Swish have emerged as state-of-the-art alternatives, leveraging their smoothness to ensure stable gradient flow and prevent neuron inactivity. In this work, we introduce the Gompertz Linear Unit (GoLU), a novel self-gated activation function defined as $\mathrm{GoLU}(x) = x \, \mathrm{Gompertz}(x)$, where $\mathrm{Gompertz}(x) = e^{-e^{-x}}$. The GoLU activation leverages the right-skewed asymmetry in the Gompertz function to reduce variance in the latent space more effectively compared to GELU and Swish, while preserving robust gradient flow. Extensive experiments across diverse tasks, including Image Classification, Language Modeling, Semantic Segmentation, Object Detection, Instance Segmentation, and Diffusion, highlight GoLU's superior performance relative to state-of-the-art activation functions, establishing GoLU as a robust alternative to existing activation functions.

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Citation Metrics

Citations1

Influential citations1

References47

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics

Related Papers