Search papers, labs, and topics across Lattice.
InstMeter is introduced as a novel method for predicting the energy and latency of deep learning model inference on microcontrollers (MCUs) by leveraging clock cycles as a fundamental metric. The method exploits the strong linearity between clock cycles and energy/latency, enabling accurate predictions with minimal training data. Experiments across various MCUs, software settings, and application scenarios demonstrate that InstMeter reduces energy and latency prediction errors by 3x and 6.5x, respectively, compared to state-of-the-art methods, while requiring significantly less training data.
Forget MACs and parameters: accurately predict DL model energy and latency on MCUs with 3x and 6.5x lower error using just clock cycles.
Deep learning (DL) models can now run on microcontrollers (MCUs). Through neural architecture search (NAS), we can search DL models that meet the constraints of MCUs. Among various constraints, energy and latency costs of the model inference are critical metrics. To predict them, existing research relies on coarse proxies such as multiply-accumulations (MACs) and model's input parameters, often resulting in inaccurate predictions or requiring extensive data collection. In this paper, we propose InstMeter, a predictor leveraging MCUs' clock cycles to accurately estimate the energy and latency of DL models. Clock cycles are fundamental metrics reflecting MCU operations, directly determining energy and latency costs. Furthermore, a unique property of our predictor is its strong linearity, allowing it to be simple and accurate. We thoroughly evaluate InstMeter under different scenarios, MCUs, and software settings. Compared with state-of-the-art studies, InstMeter can reduce the energy and latency prediction errors by $3\times$ and $6.5\times$, respectively, while requiring $100\times$ and $10\times$ less training data. In the NAS scenario, InstMeter can fully exploit the energy budget, identifying optimal DL models with higher inference accuracy. We also evaluate InstMeter's generalization performance through various experiments on three ARM MCUs (Cortex-M4, M7, M33) and one RISC-V-based MCU (ESP32-C3), different compilation options (-Os, -O2), GCC versions (v7.3, v10.3), application scenarios (keyword spotting, image recognition), dynamic voltage and frequency scaling, temperatures (21°C, 43°C), and software settings (TFLMv2.4, TFLMvCI). We will open our source codes and the MCU-specific benchmark datasets.