Search papers, labs, and topics across Lattice.
This paper investigates why deep neural networks are amenable to compression by examining the algorithmic complexity of trained weights versus randomly initialized weights. They hypothesize that trained models exhibit lower algorithmic complexity due to increased structure and repeatability. To demonstrate this, they introduce Mosaic-of-Motifs (MoMos), a constrained parameterization that partitions weights into blocks selected from a set of reusable motifs, effectively reducing Kolmogorov complexity. Experiments show that MoMos achieves comparable performance to unconstrained models while exhibiting lower algorithmic complexity.
Trained neural networks aren't just smaller after compression, they're fundamentally *simpler*, and this paper proves it by building models from a "mosaic" of repeating weight patterns.
Large-scale deep learning models are well-suited for compression. Methods like pruning, quantization, and knowledge distillation have been used to achieve massive reductions in the number of model parameters, with marginal performance drops across a variety of architectures and tasks. This raises the central question: \emph{Why are deep neural networks suited for compression?} In this work, we take up the perspective of algorithmic complexity to explain this behavior. We hypothesize that the parameters of trained models have more structure and, hence, exhibit lower algorithmic complexity compared to the weights at (random) initialization. Furthermore, that model compression methods harness this reduced algorithmic complexity to compress models. Although an unconstrained parameterization of model weights, $\mathbf{w} \in \mathbb{R}^n$, can represent arbitrary weight assignments, the solutions found during training exhibit repeatability and structure, making them algorithmically simpler than a generic program. To this end, we formalize the Kolmogorov complexity of $\mathbf{w}$ by $\mathcal{K}(\mathbf{w})$. We introduce a constrained parameterization $\widehat{\mathbf{w}}$, that partitions parameters into blocks of size $s$, and restricts each block to be selected from a set of $k$ reusable motifs, specified by a reuse pattern (or mosaic). The resulting method, $\textit{Mosaic-of-Motifs}$ (MoMos), yields algorithmically simpler model parameterization compared to unconstrained models. Empirical evidence from multiple experiments shows that the algorithmic complexity of neural networks, measured using approximations to Kolmogorov complexity, can be reduced during training. This results in models that perform comparably with unconstrained models while being algorithmically simpler.