Magazine #2 | Summer 2023
Ignoring Inference When Calculating Resource Consumption
When discussing the environmental impact of AI systems, the primary focus tends to be on the volume of resources consumed during the development and training phases of Machine Learning models. In most cases, in fact, the figures provided in conjunction with such models tend to refer to just these phases. However, a big question mark hangs over the utilization phase of AI systems. In technical jargon, this is called the “inference” phase. The development and training of AI models are very complex processes and consume a relatively large amount of energy. At the same time, though, the number of processes during these phases is limited, and they are usually completed within a foreseeable timeframe. Each utilization of an AI system during the inference stage, on the other hand, usually consumes relatively little energy. However, inference can take place extremely frequently. In late 2022, Facebook AI researchers concluded in a scientific paper that Facebook data centers performed trillions of inference operations each day, a figure they say has doubled in the last three years. The significant growth in inference has also led to an expansion of the infrastructure required to support it, the researchers say. Between the beginning of 2018 and mid-2019, the number of servers devoted specifically to inference at Facebook’s data centers increased by 2.5 times, according to the study. At a company like Facebook, this volume of inference comes from things like recommendations and ranking algorithms, for example – algorithms that are used each time Facebook’s nearly 3 billion users worldwide access the platform and view content in their newsfeed. Other typical applications that contribute to high inference rates on online platforms include image classification, object recognition in images, and translation and speech recognition services based on large language models.
Even if the amount of energy consumed by each inference operation were minimal, total resource consumption is still likely to be immense due to the sheer volume of operations and the infrastructure they require. The CEO of Nvidia, one of the largest processor manufacturers, and executives at Amazon Web Services (AWS), one of the largest cloud computing providers, announced back in 2019 that inference is responsible for approximately 90 percent of the costs of the entire Machine Learning process. Because costs are closely linked to the computing power necessary, scientists have concluded that the emissions produced in the inference phase of AI models are likely to be significantly higher than those produced during the development and training phases. This presumption is supported by internal figures from facebook, which confirm that for in-house systems, resource consumption during the inference phase can be, depending on the application, far higher than during development and training.
As such, it would be negligent to disregard the inference phase when calculating the energy consumption of AI systems. When determining the resource consumption of automobiles, after all, we don’t ignore the gasoline consumed while driving.