Magazine #2 | Summer 2023
PaLM: The Wrong Understanding of Efficiency
Google’s 540-Billion-Parameter Language Model PaLM
In spring 2022, Google researchers unveiled a new AI language tool called the Pathways Language Model, or PaLM. Its capabilities include the ability to interpret inputted text and produce new, meaningful text segments of its own. Whereas previous models like BERT (110 million parameters) or GPT-3 (175 billion parameters) attracted attention due to their incredible size, PaLM has now set a new record, with 540 billion parameters. Parameters are values that a Machine Learning model learns during the training process, and they form the basis for the outcomes the model then produces.
The number of parameters also determines the number of computing operations that must be performed and, thus, the amount of energy consumed. It is likely that a model with 540 billion parameters also consumes an extremely high amount of energy – during development, during training and, presumably, also during use.
At the same time, the Google research team claims to have made a breakthrough in terms of training efficiency. This advance has been achieved, the researchers say, by way of newly developed hardware called Tensor Processing Units (TPUs), which enable accelerated computation, and through new strategies in parallel computing. Google says it was able to significantly reduce the amount of time it took to train the vast model, thus saving energy.
A single training run of PaLM at a Google data center in Oklahoma, which obtains 89 percent of its energy requirements from carbon-free energy sources, resulted in 271.43 tons of CO2 emissions. That is roughly the equivalent to the emissions produced by a fully occupied commercial jet during 1.5 flights across the United States.
Comparative values regarding the emissions produced by previous models during training are mostly based on estimates. As such, one can only assume that around 270 tons of CO2 emissions for a system as large as PaLM represents, from a relative point of view, a significant improvement. But the question remains as to why a far more efficient hardware innovation and new training methods were only deployed to make models even larger, rather than to improve the energy efficiency of smaller, yet still quite substantial models. That isn’t just irresponsible from the perspective of resource conservation. Such vast models also make it more difficult to detect and remove discriminatory, misogynistic and racist content from the data used in training.
Machine Learning research has not yet focused sufficiently on resource conservation – and it’s not just Big Tech companies that must be held accountable on the issue. The example of PaLM once again clearly shows that the mentality of “the bigger the better” continues to dominate Machine Learning research, which stands in direct contrast to the urgent need to reduce resource consumption in the entire digital sector, especially in the resource-intensive AI branch.
Furthermore, Google’s emphasis on the comparatively low emissions produced during the training of PaLM is misleading. The training of a model never reflects the total amount of emissions generated – indeed, it is often just a fraction of that total. To be able to make a comprehensive determination regarding the resource efficiency of specific AI systems, emissions produced during development and application must be quantified along with the emissions associated with the hardware used. Google could at least have specified the number of training runs it performed during the development phase and how high emissions were in total. That, though, would likely have produced a significantly different image – and the green finish Google has sought to apply to itself would quickly have flaked off.