Search papers, labs, and topics across Lattice.
This paper analyzes the approximation capabilities of deep residual networks by framing them as continuous dynamical systems and quantifying the minimal time (depth) needed to approximate diffeomorphisms with flows driven by a family of vector fields. It establishes a connection between this minimal time and a geodesic distance on a sub-Finsler manifold of diffeomorphisms, linking learning efficiency to the compatibility between target relationships and architectural choices. The study highlights a fundamental difference between approximation in deep learning (composition/dynamics) and linear approximation theory (linear spaces/norm-based estimates), emphasizing manifolds and geodesic distances.
Deep learning's approximation power hinges on geodesic distances on manifolds, not just linear spaces, revealing a fundamental departure from classical approximation theory.
We investigate the dependence of the approximation capacity of deep residual networks on its depth in a continuous dynamical systems setting. This can be formulated as the general problem of quantifying the minimal time-horizon required to approximate a diffeomorphism by flows driven by a given family $\mathcal F$ of vector fields. We show that this minimal time can be identified as a geodesic distance on a sub-Finsler manifold of diffeomorphisms, where the local geometry is characterised by a variational principle involving $\mathcal F$. This connects the learning efficiency of target relationships to their compatibility with the learning architectural choice. Further, the results suggest that the key approximation mechanism in deep learning, namely the approximation of functions by composition or dynamics, differs in a fundamental way from linear approximation theory, where linear spaces and norm-based rate estimates are replaced by manifolds and geodesic distances.