Magazine #1 | Summer 2022
Rethinking AI As Community Work
- in an interview with Dr. Alex Hanna
Alex Hanna, a sociologist by training, explores how the use of data in new computational technologies is helping to exacerbate existing inequalities around gender, ethnicity and class. We spoke with her about why she left her job on Google’s ethics team to join her former supervisor, Timnit Gebru – who had previously been fired from Google.
Interview with Dr. Alex Hanna
Google, Microsoft and Facebook only fund research relating to existing scientific paradigms concerning the optimization of their business models. That’s directly or indirectly the case, so either in terms of the types of papers they put out or the funding they give to university researchers, research nonprofits or “AI for Good” projects. Funding guides what problems people work on. They typically don’t fund things that are contrary to their interests; and if they do, it’s in a very limited capacity.
It is part of the problem when it is used to concentrate and consolidate power and it is used as a means of exacerbating existing inequalities. Most of the time, that AI is implemented in the Big Tech context,the aim tends to be that of facilitating recommendation systems, ad targeting or minimizing customer “churn,” so, it’s a facilitator for business. AI is also being used in the public sector as a means to minimize the amount of human labor needed for welfare allocation or to identify fraud. But at the same time, it is becoming a tool of surveillance. AI often has the effect of worsening conditions for workers, either by creating a new class of laborers who work for minimal wages to produce data for AI or by optimizing conditions for employers in gig economy settings to the detriment of workers.
People who are impacted by AI and automated decision-making systems need to have a much greater say in where and when these systems can be deployed. We want to begin by including communities in research activities.
If we are going to rethink AI, we will have to rethink what is needed by communities, especially what is needed by marginalized racial, ethnic and gender communities. Some of these tools can be used as a means of taking some of that power back or supporting community decision-making and engagement. Some of the work DAIR is doing points in that direction, for instance, the work that we’ve done on spatial apartheid and on how AI can support processes of desegregation in South Africa. Another thing that we’re looking into is how we can use AI or natural language processing tools to find and identify abusive social media accounts of government actors. We’re trying to recalibrate how AI is used and to find a way that doesn’t concentrate power but instead redistributes it.
Background
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
Published 2021 | Timnit Gebru, Emily Bender, et al.
Google, Microsoft and Facebook only fund research relating to existing scientific paradigms concerning the optimization of their business models. That’s directly or indirectly the case, so either in terms of the types of papers they put out or the funding they give to university researchers, research nonprofits or “AI for Good” projects. Funding guides what problems people work on. They typically don’t fund things that are contrary to their interests; and if they do, it’s in a very limited capacity.
More Harmful than Flying
The “Stochastic Parrots” study builds on previous research work, especially the 2019 paper from Emma Strubell and her collaborators on the carbon emissions and financial costs associated with large language models (“Energy and Policy Considerations for Deep Learning in NLP“). Training large AI models consumes a lot of computer processing power and hence lots of electricity. Their energy consumption and carbon footprints have been exploding since 2017, as models have been fed more and more data. Training a version of Google’s language model BERT, which underpins the company’s search engine, produced 1,438 pounds of CO2 emissions, roughly equivalent to a round-trip flight between New York City and San Francisco. Such models aren’t only trained once though, but many times over in the research and development process.
Reproducing Social Distortion
Only rich organizations, the paper argues, have access to the resources required to build and sustain such large AI models, while the climate change effects caused by their energy consumption hits marginalized communities the hardest. The training data is generally collected from the internet, so there’s a risk that racist, sexist and otherwise abusive language ends up in it. Because the data sets are so large, it’s very difficult to audit them to check for these embedded biases. The authors conclude that large models interpret language in a way that reproduces outdated social norms and patterns of discrimination. These models will also fail to capture the language and the norms of countries and peoples that have less access to the internet and thus a smaller linguistic footprint online.
The Costs of Profit
According to Timnit Gebru and her colleagues, another issue with large language models is the risk of “misdirected research effort.” They argue that these models don’t actually understand language. The models just parrot what was put into them based on the calculated probability that certain words create meaning. They are merely excellent at manipulating language. Big Tech companies have continued to invest in them because of the profits they promise. On a social scale, it would be more desirable to work on AI models that might achieve understanding, or that achieve good results with smaller, more carefully curated data sets (and thus consume less energy). But the authors fear that nothing beats the promise of profit, even if large language models come with another risk: They could be used to generate misinformation because they appear to be so meaningful.
In December 2021: Establishment of the Distributed AI Research Institute (DAIR).
On the anniversary of her exit from Google, Timnit Gebru published a press release in which she announced the launch of a new organization, the Distributed AI Research Institute (DAIR), which is designed as “an independent, community-rooted institute set to counter Big Tech’s pervasive influence on the research, development and deployment of AI.” The institute’s work is focused on the process and principles of AI research. One of its premises is that the dangers embedded in AI technology would be preventable if its production and deployment were based on the inclusion of communities and a greater diversity of perspectives. Today, one of the institute’s projects is to use satellite imagery and computer vision to analyze the effects of spatial apartheid in South Africa. In another project, Datasheets for Datasets, Timnit Gebru tries to establish currently non-existent industry standards for documenting Machine Learning datasets. With Datasheets for Datasets, she aims to increase transparency and accountability within the Machine Learning community, mitigate biases in Machine Learning models and help researchers as well as practitioners choose the right dataset.
Chronicle of a Split Foretold
The developments that led to the foundation of the DAIR Institute aren’t just the story of a dismissal. They raised awareness about the existence of a toxic culture in Big Tech.
On the evening of December 2, 2020, Timnit Gebru, the co-lead of Google’s Ethical AI team, announced via Twitter that the company had forced her out. She was known for co-authoring a groundbreaking study called “Gender Shades” on the gender and racial biases embedded in commercial face recognition systems in 2018, when she was a researcher at Microsoft. The study showed facial recognition to be less accurate at identifying women and people of color, which means its use could end up discriminating against them. She also cofounded the Black in AI affinity group and champions diversity in the tech industry. Her critical work has frequently challenged mainstream AI practices.
Gebru’s departure was the result of a conflict over a paper she co-authored. Google executives asked her to either withdraw a still unpublished paper, or remove the names of all the Google employees from it (five of the six co-authors). Jeff Dean, the head of Google AI, told colleagues in an internal email (which he later shared on Twitter) that the paper “didn’t meet our bar for publication” because it “ignored too much relevant research.” Specifically, he said it didn’t mention more recent work on how to make large language models more energy efficient and mitigate problems of bias. However, the paper’s citation list contains 128 references. This contributed to speculations of other actors in the field of AI ethics that Google pushed Timnit Gebru out because the paper revealed some inconvenient truths about a core line of its research. More than 1,400 Google staff members and 1,900 other supporters signed a letter of protest after her dismissal.
The paper in question is called “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?”. Emily M. Bender, a professor of computational linguistics at the University of Washington, was the only co-author who was not a Google researcher. The paper’s goal, Bender told MIT Technology Review, was to take stock of the landscape of current research in natural language processing. Google pioneered much of the foundational research for large language models. Google AI was the first to invent the transformer language model in 2017, which serves as the basis for the company’s later model BERT and OpenAI’s GPT-2 and GPT-3. BERT now also powers Google search, the company’s primary source of money. Bender worries that Google’s actions could create “a chilling effect” on future AI ethics research. Many of the top experts in AI ethics work at large tech companies because that is where they find work.
Two members of Google’s Ethical AI group have since left Google. Senior researcher Alex Hanna and software engineer Dylan Baker joined Timnit Gebru’s nonprofit research institute, Distributed AI Research (DAIR). Hanna announced her resignation on Medium. In her announcement, she criticized the “toxic” work environment at Google and lamented the lack of Black women in the Google Research organization. She concluded: “In a word, tech has a whiteness problem. […] So in this sign-off, I encourage social scientists, tech critics, and advocates to look at the tech company as a racialized organization. Naming the whiteness of organizational practices can help deconstruct how tech companies are terrible places to work for people of color, but also enable an analysis of how certain pernicious incentives enable them to justify and reconstitute their actions in surveillance capitalist and carceral infrastructures.”
DR. ALEX HANNA
Director of Research at the Distributed AI Research Institute (DAIR)
She has worked extensively on the ethics of AI and on social movements. She serves as a co-chair of Sociologists for Trans Justice, as a Senior Fellow at the Center for Applied Transgender Studies and sits on the advisory board for the Human Rights Data Analysis Group and the Scholars Council for the UCLA Center for Critical Internet Inquiry.