Feb 26, 2026arXiv:2602.23217

Multidimensional Task Learning: A Unified Tensor Framework for Computer Vision Tasks

A. E. Ichi, Alaa El Ichi, K. Jbilou, Khalide Jbilou

AI Summary

The paper introduces Multidimensional Task Learning (MTL), a unified framework for computer vision tasks based on Generalized Einstein MLPs (GE-MLPs) that operate directly on tensors using the Einstein product. It addresses the limitations of matrix-based architectures by using tensor-valued parameters, enabling explicit control over dimensional preservation and contraction. The authors demonstrate that classification, segmentation, and detection are special cases of MTL and prove that the task space expressible by MTL is strictly larger than that of matrix-based formulations.

Key Contribution

Escape the tyranny of matrix-based thinking: a new tensor framework reveals how classification, segmentation, and detection are just different slices of the same multidimensional computer vision task.

Abstract

This paper introduces Multidimensional Task Learning (MTL), a unified mathematical framework based on Generalized Einstein MLPs (GE-MLPs) that operate directly on tensors via the Einstein product. We argue that current computer vision task formulations are inherently constrained by matrix-based thinking: standard architectures rely on matrix-valued weights and vectorvalued biases, requiring structural flattening that restricts the space of naturally expressible tasks. GE-MLPs lift this constraint by operating with tensor-valued parameters, enabling explicit control over which dimensions are preserved or contracted without information loss. Through rigorous mathematical derivations, we demonstrate that classification, segmentation, and detection are special cases of MTL, differing only in their dimensional configuration within a formally defined task space. We further prove that this task space is strictly larger than what matrix-based formulations can natively express, enabling principled task configurations such as spatiotemporal or cross modal predictions that require destructive flattening under conventional approaches. This work provides a mathematical foundation for understanding, comparing, and designing computer vision tasks through the lens of tensor algebra.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References7

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Multidimensional Task Learning: A Unified Tensor Framework for Computer Vision Tasks

Related Papers