How Generative AI Systems Reinforce Existing Power Structures
In their development and use, AI systems are always embedded in a specific social context. That also applies to large text- or image-generating models,, the output of which suggests that we are being presented with facts or realistic images. Ultimately, though, that output is little more than realistic-appearing content that disseminates specific cultural codes.
Large AI text-to-image generators like Stable Diffusion, DALL·E 2 or Midjourney are trained with vast quantities of data. They analyze frequently appearing patterns, such as the typical proportions of a face, or what landscape pictures tend to look like. When they then produce their own images of faces or landscapes, they may reflect biases that were present in the training data (such as distortions of human facial features stemming from racist caricatures) or false representations (like including typically Western architecture in an image of a city from a completely different part of the world with a radically different skyline).
Biased training data, though, isn’t the only problem. Many visual models create far less realistic images of Black women than they do of white women. They more often contain distortions and outright errors, as the artists Stephanie Dinkins and Minne Atairu discovered. Some image generator providers have reacted to such potentially harmful outputs (due to the latent racism they reveal) by blocking specific prompts (the requests made to the AI system to generate a specific image). Artist Auriea Harvey discovered, for example, that some image generator tools block prompts like “slave” or “slave ship.” But such blocks do more to conceal the problem than to actually solve it. Indeed, such blocks amount to essentially stifling historical realities, which can also magnify cultural dominance by suppressing the perspectives and experiences of minorities.
Such cultural dominance doesn’t necessarily have to manifest itself through discrimination. Western norms are often subtly forced onto the users of image generators – for example in the way people smile in AI-generated pictures. Even such a primeval form of expression like the smile may be perceived and reacted to divergently by people from different cultures.
The same risk – that of propagating a hegemonic monoculture – is also present with text generators like ChatGPT. Different languages describe human experiences in their own way. But the cornucopia of smaller languages is at risk of being lost in the algorithmic hegemony because of the vast amount of data from books, magazines, newspapers and online content necessary for training – a volume that smaller languages are simply unable to provide, particularly those languages that are only spoken. It’s no secret that English is the dominant language of technology and that many smaller languages are falling to the wayside in AI applications. The team behind Stable Diffusion even notes in its own model card that the vast majority of the training data is in English, and that entries in other languages don’t work (as) well.