Magazine #2 | Summer 2023
Community Building: Deploying Open Source Against Megalomania
True to her view that the climate crisis is the greatest global challenge we currently face, Machine Learning (ML) researcher Sasha Luccioni focuses her efforts at Hugging Face on making Artificial Intelligence (AI) models more sustainable. The AI startup has set out to tackle problems like emissions, bias and discrimination by supporting open-source approaches in the ML community. Luccioni provides insight into measuring the carbon footprint of AI models, while Hugging Face Policy Director Irene Solaiman explains how this will help policymakers generate much needed pressure.
Interview with Dr Sasha Luccioni and Irene Solaiman
Open source helps with the recycling of models. Instead of training transformer models once, you can reuse them. All the pretrained models on Hugging Face can be fine-tuned for specific use cases. That’s definitely more environmentally friendly than creating a model from scratch. Several years ago, the main approach was to accumulate as much data as possible to train a model, which would then not be shared. Now, data-intensive models are shared after training. People can reuse and retune them according to their particular use cases.
With the size of transformer and AI models growing bigger and bigger, the entry barrier for joining the AI community is becoming correspondingly high, especially for countries that don’t have access to the extremely powerful computers being used to create these models. Hugging Face has several offers available for such cases – for example, the ability to query a large language model using an API, so you don’t need to run it on your own computer. This makes such models more accessible.
The regulations pertaining to AI that have been issued in recent years haven’t focused particularly on sustainability. Measuring carbon emissions has likewise not been prioritized, but there aren’t a lot of tools available to adequately measure them. We find ourselves in a dilemma: There is an urgent need for policymakers to up the pressure, but to do so, they need emissions data. Political regulations, however, do not currently include a requirement to deploy tools for measuring emissions – which means that policymakers don’t have the data they need.
The European Union’s Artificial Intelligence Act is one of the most robust and prominent approaches to regulating AI in the public’s interest. A lot of policies and regulations are necessarily coming from countries with higher gross domestic products, such as the Algorithmic Accountability Act in the United States and the AI and Data Act in Canada. The Algorithmic Accountability Act does not explicitly include sustainability, but I appreciate the emphasis it places on impact assessments. Decision-makers need more guidance on the impact of AI systems, including CO2 emissions. Such information will give them a greater understanding for the importance of developing appropriate tools.
These models are trained on data scraped from all over the internet. By avoiding a specific and limited data source, they’re supposed to be relatively impartial. But when you use them in a downstream AI application, outputs are generated that you may not have expected. To figure out where potential biases could emerge, you have to make AI applications take decisions or make predictions in different situations. We’ve been working on ways of prompting the models by giving them bits of text and making them complete them – based on a pronoun, for example as in “She should work as” and “He should work as.” If a model continues, “She should work as a nurse” and “He should work as a computer scientist,” you can immediately see how biased, how toxic, it is. Such negative stereotypes are one example of system bias, which we can document for every AI model by creating a report card.
Most of the emissions numbers we have are from training. We don’t have many numbers from deployment. A lot of people are interested in how much CO2 will be emitted through deployment, but that’s extremely complicated, because it depends on a number of factors, including the hardware you’re using and where the computing is being done. Without knowing those factors, it’s impossible to provide information on the emissions. In order to do so, you would need to evaluate different architectures, different models, different GPUs, etc. Still, a lot of people would find such information extremely useful.
If people start using tools to measure the emissions of their ML models and disclose that information, we can start making decisions about AI models based on facts and figures. Tools like Code Carbon calculate a model’s carbon footprint in real-time. It’s a program that runs in parallel to any code and will estimate the carbon emissions at the end. We also run a website allowing you to enter information like training hours and the type of hardware used. It then provides an estimate of the system’s carbon footprint. It is less precise than Code Carbon, but it still gives you an idea.
I think that bottom-up approaches work, especially in terms of research. At conferences, we are constantly asked for more information. But there’s the issue of reproducibility: A lot of research can’t be reproduced because it is highly contingent upon specific factors. This is something the AI community has been trying to tackle by implementing certain guidelines. If you submit a paper, you have to disclose parameters X, Y and Z. You also have to make your code and data freely available. In terms of sustainability, there have to be similar measures in place pertaining to efficiency or accuracy. Only then can we compare different models. We have to provide a technical procedure that a broader community can adopt.
A lot of policy conversations I’ve been involved in have focused on lowering the regulatory burden on small and medium-sized enterprises, since these companies have fewer resources than Big Tech. Since smaller companies are less likely to have the infrastructure for analyzing their carbon emissions, we can’t expect them to be responsible for monitoring them.
Something we’ve been working on is documentation. We need more guidance from policy institutions on what, specifically, would be helpful to report over and above the information included on model cards. A lot of governments have been asking the industry for more information about models without specifying what aspects of AI sustainability the industry and developers should report. And we also need to know how to report that information in ways that are understandable to high-level policymakers, who may not have a technical background. Developers definitely need more information if we want them to think about how their systems can become more sustainable.
When we look at the infrastructure, there are both positive and negative developments. Hardware development is making rapid progress when it comes to computing efficiency. If you compare a GPU from this year to one built two or three years ago, there’s a significant difference. It’s literally 10 times faster. But with this positive development comes a negative one, because that efficiency leap means that people are doing more computing. It’s a Catch-22. If we kept the size of our models and the amount of computation needed at a constant level, we would definitely be going in the right direction. But since both are growing so fast, it’s hard to say where we might end up. I do see cloud providers taking advantage of carbon offsetting, and some are switching to renewable energy sources. On the other hand, though, the concept of “the bigger the better” in AI modeling is getting out of hand.
Background
According to DALL·E 2, Women and POC Can’t Be Authors
Transformer models have become the standard for large language models. Search engines, automated translation services, content moderation systems, speech recognition tools, text-to-image generators and many other applications are based on them. The underlying deep learning models are usually trained on extremely large datasets to deduce intrinsic structures within them. These structures form the basis for the model’s automation process while transforming input data into output data – for instance while generating an image based on a text entry. Recently, models were released that can generate either text or images based on text prompts (like GPT3, ChatGPT, Stable Diffusion, DALL·E, etc.). While their capabilities are undoubtedly impressive, they come with serious risks. Usually trained on unfiltered data scraped from the internet, they often create discriminatory, racist, misogynist or otherwise deeply biased content. Researchers at Hugging Face have developed a tool to reveal bias inherent in text-to-image generators. The tool enables the generation of prompts for DALL·E and Stable Diffusion by choosing from a list of 150 professions and 20 related adjectives. The Bias Explorer clearly demonstrates how prone to bias these models can be. If you enter the prompt “author,” and combine the profession with each of the 20 adjectives available, DALL·E 2 generates 179 images of white men and just one image of a white woman. Stable Diffusion (version 1.4) performs only slightly better, generating 13 images of persons of color out of a total of 180. When it comes to gender representation, Stable Diffusion clearly demonstrates a female bias, generating 140 pictures of female authors out of 180 overall.
AI Lifecycle and CO2: The Emissions Never Stop
There is hardly any information available about the energy consumption of AI systems and the CO2 emissions they produce. This state of affairs makes it more difficult to develop targeted political approaches aimed at reducing these emissions. It is well known that data centers, like the production and operation of all hardware, make a significant contribution to global CO2 emissions. And they provide the necessary infrastructure for the operation of AI systems. The lack of reliable numbers on the emissions produced by the usage of AI systems is an additional factor.
Sasha Luccioni, Sylvain Viguier and Anne-Laure Ligozat have taken the first step toward closing this information gap. They have produced an estimate for the amount of emissions produced by the language model BLOOM (176 billion parameters) over much of its lifecycle. The result: During the training of BLOOM, around 24.7 tons of CO2 equivalent were produced, if only direct energy consumption is taken into account. If, however, processes such as hardware manufacture and operational energy consumption are also included in the calculation on a pro-rata basis, the emission values double. Training alone, in other words, is not sufficient as a reference variable when calculating the emissions produced by AI systems. Measurements and methodically stringent calculations must cover their entire lifecycles to sensitize companies, developers and researchers and to initiate targeted political regulations.
DR. SASHA LUCCIONI
Research Scientist at Hugging Face, Inc.
Where she works on the ethical and societal impacts of Machine Learning models and datasets. She is also a Co-Chair of the Carbon Footprint Working Group within the Big Science Workshop and a member of the board of directors of Women in Machine Learning (WiML).
IRENE SOLAIMAN
Policy Director at Hugging Face, Inc.
Where she is conducting social impact research and building public policy. She also advises responsible AI initiatives at the Organization for Economic Cooperation and Development (OECD) and the Institute of Electrical and Electronics Engineers (IEEE). Irene formerly developed AI policy at the Zillow Group. Before that, she led public policy at OpenAI, where she initiated and led bias and social impact research. Irene holds a master’s degree in public policy from Harvard University.