Reducing biased and harmful outcomes in generative AI

A process for mitigating offensive and adverse outputs

A grid of generative AI images of adults, 11 across and 8 down, of all ages, ethnicities, and genders.

Images created in Adobe Firefly using the prompt “portrait of a human being”

Generative AI has the potential to transform a wide range of industries and experiences. But one of Adobe’s biggest considerations as it readied to launch Adobe Firefly, Adobe’s family of creative generative AI models designed for safe commercial use, was the technology’s capacity to create harm and amplify bias—in its datasets and capabilities, and by the speed with which new tools are being introduced and the processes that have informed them.

In addition to our wider remit of making Adobe tools more equitable, Adobe’s Product Equity team—along with the Ethical Innovation, Trust & Safety, Legal, and Firefly teams—was at the center of the company’s efforts to reduce harm and bias in Firefly. Doing that work meant creating the space to openly think about the impact of our decisions on people who are structurally marginalized or from historically underinvested communities, and to do the hard work up front to make the right path a clearer and easier one for product teams.

To have the greatest impact, we prioritized minimizing exposure to harmful and offensive content and ensuring diverse representation of people, cultures, and identities in Firefly’s core features: text-to-image and Text Effects generation.

There were no templates for this work when we began (the technology is too new) and no quick way to do it (relentless testing was needed to get the results we wanted). And although our work has just begun, and our process is constantly evolving, we’re sharing our initial approach with the hope that the lessons we’ve learned and the positive outcomes we’ve generated will help other design and product teams working to build responsible generative AI products.

Complete a detailed assessment of human impact

Until models are trained, generative AI outputs will be unpredictable. Outputs, a generative model’s response to a prompt, can be affected by data sets, taxonomies (hierarchies of information), and metadata (descriptions or information accompanying data). Generative models must “learn” how to use data in ways that aren’t harmful, which is why assessments, intervention, and mitigation efforts are necessary.

At the start of the training process, there will be undesirable results. Those might include underrepresentation (prompts returning results with only white people), misrepresentation (offensive results for things like religion-related searches), bias (outputs related to professions that follow gender and racial stereotypes) and hate imagery (although hateful terms are often blocked first, people often find ways to circumvent systems with alternative words). Equitable and ethical model development requires active work with human monitors, and technology stack mitigation efforts to ensure key intervention points for harm reduction (ongoing action to reduce the presence and prevalence of harm).

Before generative models are released, teams must address instances of hate, exploitation, and discoverable representation (the ability to see diversity within a reasonable percentage of outputs). To assess the potential for biased and harmful outputs and experiences use adversarial testing, the deliberate input of harmful prompts, to uncover the frequency of results that could harm structurally and institutionally marginalized communities. The results from this adversarial testing can then be evaluated using a three-tiered human perspectives analysis, to assess levels of human impact:

Minimal impact: Infrequent harmful and biased outputs and those limited to artifacts, backgrounds, and settings without human portrayal
Moderate impact: Nuanced outputs with culturally specific results like negatively reinforced stereotypes that generalize a community
High impact: Easily and consistently repeatable harmful and biased outputs (such as hateful or charged imagery) that either directly or indirectly impacts humans

Focus on prompts

As the signposts for the system to produce text or images, prompts in generative AI can be powerful tools for creative expression. They also have the potential to be harmful. We focused our evaluation on four circumstances:

Unintended consequences: Any unexpected results, based on the language in a prompt, that could return a harmful result
Intentional abuse of the system: People purposely trying to hack the system to generate negative or harmful results
Harmful content generation: The thresholds (type, frequency, and severity) for how much harmful content was being generated would help us choose what to focus on first
Bias and stereotype amplification: The exaggeration of stereotypes and tropes within generated content

Keeping the focus on primary goals will have the greatest impact on the experience, outcome, and future of a generative model. For Firefly we had two:

Suppressing hateful and exploitative content
Improving diversity, representation, and portrayal in outputs

Suppressing hateful and exploitative content

Our main priority for Firefly was ensuring that the generation of hateful and exploitative content was mitigated. It's possible to reduce the type of content that dehumanizes and subordinates marginalized communities, and perpetuates harmful attitudes and behaviors, by focusing on racial slurs and the protection of children. Those are good places for work to begin.

Suppress words that can be used for harmful image generations (like hate speech and words that sexualize children) but be careful not to stifle creative expression by censuring all/any words that could be considered violent (like zombie, plague, attack). Use red teaming (intentionally forcing a system to do bad things to see how it performs) to prioritize language that needs to be classified and filtered, then put systems in place:

Create prompt block-and-deny lists (a curated list of words for which the AI model is explicitly instructed to avoid generating outputs) to reduce the possibility of harmful content being generated (particularly content connected to hate, regulated substances, and illegal activities). A blocked prompt will generate an error message instead of images and a denied prompt will generate images to a prompt with the suppressed word removed along with a popup stating that the prompt doesn’t meet our criteria. The trade-off as this type of system matures is that prompts and content may be blocked even when used in a safe context (like “shooting a basketball” vs “a school shooting”).
Establish classifiers and filters to reduce instances of Not Safe for Work (NSFW) content and evaluate whether they also blocked harmful terms that didn’t appear in prompt block-and-deny lists (like “naked”).

Four generative AI images, each with slightly different feet and hands positions, of a woman in a flowing gold dress and heels, dancing in the center of a group of people. Centered at the bottom of the images is an illustration of a monkey, shrugging with his hands up near his chin, alongside the words "Uh oh. One ore more words may not meet user guidelines and were removed" and a button that reads "Learn more."

A denied content message in Firefly. Images are generated from the prompt but without the context of a suppressed word. For example, in the prompt "Vanna White dancing," "Vanna White" is suppressed because she’s famous, so generated images are in response to a revised prompt, “a woman dancing.”

Centered at against a black background is an illustration of a monkey, shrugging with his hands up near his chin, alongside the words "Can't load. We can't process this prompt. Please edit and try again." and two buttons that read "Learn more. Flag for review."

A blocked content message in Firefly. No images are generated from the prompt.

Prompts are evaluated and assessed against a bypass list (a list of allowances that the model is not mature enough to understand) and before an image is generated, it considers whether it contains exploitative or hateful content. A word of caution about extensive block-and-deny lists: Since generative models don’t understand nuance, they can be detrimental in some ways so it's a good idea to implement a safeguard then scale approach—since data classifiers also learn through the model and its dataset, time and correct inputs eventually help reduce harmful bias so safeguards can be eased over time as the classifiers learn.

As an example, when we received feedback on social media that the term “drag queen” wasn’t consistently rendering results—which had the potential to lead to erasure (the intentional or unintentional act of neglecting, suppressing, or marginalizing the identity, culture, or contributions of a specific community within broader society)—we created a curated test suite of prompts exploring gender identity that we used to train our model and improve outputs for the LGBTQIA+ community.

Improving diversity, representation, and portrayal

Inaccurate portrayal and underrepresentation of race and gender can lead to harmful stereotyping and misidentification. Understanding those stereotypes and tropes can help teams make decisions that will improve the outcomes and reduce harm to people consuming generated images. Again, use red teaming to assess depictions of social identities (the identity-defining attributes such as race, gender, ability, status, and any other form of human identification or difference) and groups in relation to bias, harm, diversity, and representation in terms of discoverability, frequency, and stereotype detection. These groups are many, but include:

LGBTQIA+
People in the criminal justice system
Racialized communities (such as Indigenous, Black, Latinx, and other communities of color)
People with disabilities, including D/deaf, autistic, neurodiverse, or chronically ill people
Older populations
Refugees and undocumented populations
People with mental health conditions

To increase diversity, gauge the quality of outputs and whether stereotypes and bias are creating potentially harmful outcomes. Compare prompts and generated output for various social identities and assess them against groupings of similarly generated content to look at the potential rate at which stereotypes and bias are occurring.

Consider building debiasing tools into the model. Debiasing is the intentional effort to reduce bias in AI-generated content regarding how humans are represented and portrayed. It helps reduce stereotypes and misrepresentation by applying country or cultural specifics to prompts. Debiasing for ethnicity and race involves assigning values to how race or skin tone are distributed across regions so that non-specific prompts, like "woman," generated within those regions will return results that are relevant and representative of the locales.

And don’t forget to interview creative folks from historically underinvested and marginalized communities. Gathering candid feedback about their perspectives on text-to-image tools can help teams better understand the impact of AI-generated content on people within and outside of those communities. Our first interviews for Firefly were instrumental in informing and shifting narratives in our processes and approaches.

As the work continues

Ensuring equitable outcomes throughout the product development process is work that’s never truly done, so it’s important to continually acknowledge accomplishments (both radical and incremental) while evolving processes.

Evolve block-and-deny lists as the system matures

Block-and-deny lists are extremely successful at suppressing hateful content, derogatory language, and offensive terms. Examples of how those lists have matured not only show the success of a safeguard then scale approach, but the importance of community feedback:

For a time, the word “handicap” was blocked in Firefly because it's considered derogatory in the Western world. It was unblocked when we received feedback from people trying to design disability placards—termed “handicap pass” by many cities.
And when people couldn’t use the phrase “shooting a basketball,” because we’d blocked the word “shooting,” we received complaints about our list being too conservative. The basketball prompt is now successful, while weaponized uses of “shooting” are predominately blocked. It’s important to always err on the side of balance by putting safeguards in place and devolving them only as the model and the technology mature.

The difficulty with this work is that every use case must be evaluated because of what could be derived from a generative system in response to a prompt.

A grid of generative AI images of drag queens, 9 across and 6 down, of all ages and ethnicities.

Focused work on the term “drag queen” has resulted in less stereotypical imagery and diverse human representation in race, ethnicity, body type, and age.

Define what a default international experience should include by better understanding the racial distribution of globalized, evolving, and homogeneous countries and their representation in data sets and outputs. For Firefly, progress is steady (there is more diversity than was initially available just a few short months ago):

A grid of generative AI images of adult couples, 9 across and 6 down, of all ages, ethnicities and genders.

We’re seeing greater diversity and humanity across multiple intersections of social identities. An exploration of the prompt “a photo of queer couple.”

A grid of generative AI images of Black adults and children, 13 across and 9 down, of all ages and genders and with a wide range of hairstyles.

We’ve seen an increase in diversity of Black hairstyles and textures as well as appearances and features and we’re working toward a wider diversity of skin tone and texture.

A grid of generative AI images of adults, 6 across and 4 down, of all ages, ethnicities, and genders.

There's also increased discoverable diversity, realism, and a move away from the “perfect humans” trap.

Design discoverable feedback mechanisms

Carefully designed feedback mechanisms increase equity. Make sure that the mechanisms for how bias and harm are reported aren't buried in the UI. A difficult-to-find report function could result in underreporting of potentially harmful and biased content and make it hard for teams to receive it. Make feedback systems discoverable and useful without overwhelming people, so impactful and actionable feedback can be easily integrated into product decisions.

Two sets of the same four generative AI images of young woman dancing. The women, of different ethnicities, are each performing a different type of dance. Superimposed on the group of photographs on the left is a checklist that reads "Report results. Select all that apply (required): Harmful, Illegal, Offensive, Biased, Trademark violation, Copyright violation, Nudity/Sexual content, and Violence/gore," alongside an empty text field and two buttons that read "Cancel. Submit feedback." Superimposed on the group of images on the right, along the top edge of the upper right hand photograph, are a pencil icon, an ellipses icon, a download icon, and a heart icon. Aloong the bottom edge of the same image are two buttons that read: "Rate this result" alongside thumbs up and thumbs down icons and "Report" alongside an arrow icon.

Firefly has two distinct and easily discoverable feedback mechanisms: one for reporting harmful and biased content (left) and another for providing feedback on output quality (right).

Reducing harm and bias in generative AI is an ongoing process that requires continuous learning, improvement, and investment, and the commitment of many. It was the job of Adobe’s Product Equity team to ensure that everyone involved in creating Firefly understood the weight of and responsibility of this work. It would have been impossible to complete without the help of many Adobe teams—together we’ve assessed over 3,700 pieces of feedback, over 25,000 prompts, and over 50,000 images.

The most valuable contribution teams can make as they do this work is to slow down the product development process enough to better understand the direct and indirect impact of generative models.

Header copy

Design your career at Adobe.

Button copy

View all jobs

Button link

/jobs