Institut Polytechnique de ParisLASTIGUniv Gustave EiffelUniversity of PadovaApr 7, 2026arXiv:2604.06129

PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

David Picard, Nicolas Dufour, Lucas Degeorge, Arijit Ghosh, Davide Allegro, Tom Ravaud, Yohann Perron, Corentin Sautier, Zeynep Sonat Baltaci, Fei Meng, Syrine Kalleli, Marta López-Rauhut, Thibaut Loiseau, Ségolène Albouy, Raphael Baena, Elliot Vincent, Loic Landrieu

AI Summary

The paper introduces the Polynomial Mixer (PoM), a linear-complexity token mixing mechanism that replaces self-attention by aggregating tokens into a compact representation via a learned polynomial function. PoM provably maintains the universal sequence-to-sequence approximation property of transformers. Experiments across five domains show that PoM matches attention performance while significantly reducing computational cost on long sequences.

Key Contribution

Attention's quadratic scaling problem? Solved: this new Polynomial Mixer (PoM) matches attention performance at linear complexity across diverse tasks.

Abstract

This paper introduces the Polynomial Mixer (PoM), a novel token mixing mechanism with linear complexity that serves as a drop-in replacement for self-attention. PoM aggregates input tokens into a compact representation through a learned polynomial function, from which each token retrieves contextual information. We prove that PoM satisfies the contextual mapping property, ensuring that transformers equipped with PoM remain universal sequence-to-sequence approximators. We replace standard self-attention with PoM across five diverse domains: text generation, handwritten text recognition, image generation, 3D modeling, and Earth observation. PoM matches the performance of attention-based models while drastically reducing computational cost when working with long sequences. The code is available at https://github.com/davidpicard/pom.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

Related Papers