Search papers, labs, and topics across Lattice.
This paper introduces a taxonomy of structured operators that go beyond standard convolution for learning-based image processing. The taxonomy categorizes operators into five families: decomposition-based, adaptive weighted, basis-adaptive, integral/kernel, and attention-based operators, each offering different structural properties compared to convolution. By analyzing these operators across dimensions like linearity, locality, and computational cost, the paper provides guidance on their suitability for various image processing tasks and highlights open research challenges.
Convolution's reign in image processing may be ending: this paper maps out five families of structured operators that could surpass its limitations in capturing complex image properties.
The convolution operator is the fundamental building block of modern convolutional neural networks (CNNs), owing to its simplicity, translational equivariance, and efficient implementation. However, its structure as a fixed, linear, locally-averaging operator limits its ability to capture structured signal properties such as low-rank decompositions, adaptive basis representations, and non-uniform spatial dependencies. This paper presents a systematic taxonomy of operators that extend or replace the standard convolution in learning-based image processing pipelines. We organise the landscape of alternative operators into five families: (i) decomposition-based operators, which separate structural and noise components through singular value or tensor decompositions; (ii) adaptive weighted operators, which modulate kernel contributions as a function of spatial position or signal content; (iii) basis-adaptive operators, which optimise the analysis bases together with the network weights; (iv) integral and kernel operators, which generalise the convolution to position-dependent and non-linear kernels; and (v) attention-based operators, which relax the locality assumption entirely. For each family, we provide a formal definition, a discussion of its structural properties with respect to the convolution, and a critical analysis of the tasks for which the operator is most appropriate. We further provide a comparative analysis of all families across relevant dimensions -- linearity, locality, equivariance, computational cost, and suitability for image-to-image and image-to-label tasks -- and outline the open challenges and future directions of this research area.