Apr 22, 2026arXiv:2604.20368

LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel

Zhe Feng, Sen Lian, Changwei Wang, Muyang Zhang, Tianlong Tan, Rongtao Xu, Weiliang Meng, Xiaopeng Zhang

AI Summary

This paper introduces LaplacianFormer, a Transformer variant using a Laplacian kernel as a theoretically grounded alternative to softmax attention, addressing the limitations of Gaussian kernel approximations. To mitigate expressiveness loss from low-rank approximations, they propose an injective feature map that preserves token information. By combining a Nyström approximation with Newton-Schulz iteration and custom CUDA implementations, LaplacianFormer achieves competitive performance-efficiency trade-offs on ImageNet.

Key Contribution

Laplacian kernels offer a theoretically sound and empirically superior alternative to Gaussian kernels for linear attention, finally bridging the gap between efficient computation and expressive power in Transformers.

Abstract

The quadratic complexity of softmax attention presents a major obstacle for scaling Transformers to high-resolution vision tasks. Existing linear attention variants often replace the softmax with Gaussian kernels to reduce complexity, but such approximations lack theoretical grounding and tend to oversuppress mid-range token interactions. We propose LaplacianFormer, a Transformer variant that employs a Laplacian kernel as a principled alternative to softmax, motivated by empirical observations and theoretical analysis. To address expressiveness degradation under low-rank approximations, we introduce a provably injective feature map that retains fine-grained token information. For efficient computation, we adopt a Nystr\"om approximation of the kernel matrix and solve the resulting system using Newton--Schulz iteration, avoiding costly matrix inversion and SVD. We further develop custom CUDA implementations for both the kernel and solver, enabling high-throughput forward and backward passes suitable for edge deployment. Experiments on ImageNet show that LaplacianFormer achieves strong performance-efficiency trade-offs while improving attention expressiveness.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel

Related Papers