Search papers, labs, and topics across Lattice.
This paper introduces AxMoE, the first study analyzing the impact of approximate multipliers on Mixture-of-Experts (MoE) DNNs across CNNs and Transformers. They evaluate Hard, Soft, and Cluster MoE variants against dense baselines using various approximate 8-bit multipliers, both with and without retraining. Results demonstrate that the resilience and recovery after approximation varies significantly across architectures, MoE topologies, and multiplier types, highlighting the need for architecture-aware approximation strategies.
Approximate computing can break MoEs in unexpected ways, with dense networks sometimes proving more robust, but careful retraining can unlock surprising efficiency gains in specific architectures.
Deep neural network (DNN) inference at the edge demands simultaneous improvements in accuracy, computational efficiency, and energy consumption. Approximate computing and Mixture-of-Experts (MoE) architectures have each been studied as independent routes towards efficient inference, the former by replacing exact arithmetic with low-power approximate multipliers, the latter by routing inputs through specialized expert sub-networks to enable conditional computation. However, their interaction remains entirely unexplored. This paper presents AxMoE, the first study of the impact of approximate multiplication on MoE DNN architectures. We evaluate three MoE variants: Hard MoE, Soft MoE, and Cluster MoE against dense baselines across three CNN architectures (ResNet-20, VGG11_bn, VGG19_bn) on CIFAR-100 and a Vision Transformer (ViT-Small) on Tiny ImageNet-200 dataset, using eight 8-bit signed multipliers (including one exact baseline) from the EvoApproxLib library. Results show that, without retraining, the Dense baseline is the most resilient topology across all CNN architectures, whereas on ViT-Small, all topologies degrade at comparable rates regardless of routing strategy. After approximate-aware retraining, recovery varies substantially across architectures, topologies, and multipliers. ResNet-20 achieves full recovery across the entire multiplier range, whereas VGG architectures recover at moderate multipliers but fail irreversibly at aggressive ones for all topologies except Cluster MoE on VGG11_bn; on ViT-Small, Hard MoE outperforms Dense under aggressive approximation at equal normalized inference cost. These results pave the way for future approximate MoE hardware-software co-design strategies.