Search papers, labs, and topics across Lattice.
This paper investigates using inference time as a proxy for estimating the energy consumption of API-based Large Language Models (LLMs), addressing the opacity of energy costs for users. By comparing time-based estimations with actual energy measurements from locally hosted LLMs, the authors demonstrate the feasibility of inferring the underlying GPU models used by APIs. The findings suggest that inference time can be a useful indicator for approximating the energy footprint of black-box LLM APIs.
Inference time can reveal the GPU models behind black-box LLM APIs, offering a way to estimate their hidden energy costs.
The energy consumption of Large Language Models (LLMs) is raising growing concerns due to their adverse effects on environmental stability and resource use. Yet, these energy costs remain largely opaque to users, especially when models are accessed through an API - a black box in which all information depends on what providers choose to disclose. In this work, we investigate inference time measurements as a proxy to approximate the associated energy costs of API-based LLMs. We ground our approach by comparing our estimations with actual energy measurements from locally hosted equivalents. Our results show that time measurements allow us to infer GPU models for API-based LLMs, grounding our energy cost estimations. Our work aims to create means for understanding the associated energy costs of API-based LLMs, especially for end users.