AI bias tests gloss over a crucial aspect of skin color, Sony research claims

Illustration by Alex Castro / The Verge

While the AI industry has focused on making its algorithms less biased based on the lightness or darkness of people’s skin tones, new research from Sony is calling for red and yellow skin hues to also be taken into account. In a paper published last month, authors William Thong and Alice Xiang from Sony AI, as well as Przemyslaw Joniak from the University of Tokyo, put forward a more “multidimensional” measurement of skin color in the hope that it might lead to more diverse and representative AI systems.

Researchers have been drawing attention to skin color biases in AI systems for years, including in an important 2018 study from Joy Buolamwini and Timnit Gebru that found AI was more prone to inaccuracies when used on darker-skinned females. In response, companies have stepped up efforts to test how accurately their systems work with a diverse range of skin tones.

Google, for example, promoted the introduction of the Monk Skin Tone Scale last year, which uses a 10-point scale to measure a diverse array of skin tones ranging from dark to light. Another commonly used measure is the Fitzpatrick scale, which consists of six categories, and which Meta has said it’s used in previous research.

The problem, according to Sony’s research, is that both scales are primarily focused on the lightness or darkness of skin tone. “If products are just being evaluated in this very one-dimensional way, there’s plenty of biases that will go undetected and unmitigated,” Alice Xiang, Sony’s global head of AI Ethics tells Wired. “Our hope is that the work that we’re doing here can help replace some of the existing skin tone scales that really just focus on light versus dark.” In a blog post, Sony’s researchers specifically note that current scales don’t take into account biases against “East Asians, South Asians, Hispanics, Middle Eastern individuals, and others who might not neatly fit along the light-to-dark spectrum.”

Grid showing anonymised individuals with different skin tones.
Image: Sony
A grid showing a range of skin colors.
Two charts showing spread of skin tones and hues across image datasets.
Image: Sony
Two face datasets, showing the spread of skin color tones and hues they contain.

As an example of the impact this measurement can have, Sony’s research found that common image datasets overrepresent people with skin that’s lighter and redder in color, and underrepresent darker, yellower skin. This can make AI systems less accurate. Sony found Twitter’s image-cropper and two other image-generating algorithms favored redder skin, Wired notes, while other AI systems would mistakenly classify people with redder skin hue as “more smiley.”

Sony’s proposed solution is to adopt an automated approach based on the preexisting CIELAB color standard, which would also eschew the manual categorization approach used with the Monk scale.

Although Sony’s approach is more multifaceted, part of the point of the Monk Skin Tone Scale — which is named after creator Ellis Monk — is its simplicity. The system is intentionally limited to 10 skin tones to offer diversity without risking the inconsistencies associated with having more categories. “Usually, if you got past 10 or 12 points on these types of scales [and] ask the same person to repeatedly pick out the same tones, the more you increase that scale, the less people are able to do that,” Monk said in an interview last year. “Cognitively speaking, it just becomes really hard to accurately and reliably differentiate.”

Monk also pushed back against the idea that his scale doesn’t take undertones and hue into account “Research was dedicated to deciding which undertones to prioritize along the scale and at which points,” he tells Wired.

Nevertheless, Wired reports that a couple of major AI players have welcomed Sony’s research, with both Google and Amazon noting that they’re reviewing the paper.

Recent Articles

Related Stories