Is AI bias the new elephant in the room?
Building equitable AI systems
5 minute read
Building equitable AI systems
5 minute read
On this page
Artificial Intelligence has evolved from a speculative technology to a transformative force reshaping virtually every industry. From healthcare and finance to marketing and entertainment, AI now powers systems that analyze massive datasets, automate processes, and enhance decision-making at unprecedented speeds. However, as AI becomes more embedded in our daily lives, questions about fairness, bias, and inclusivity are surfacing.
At the "Is AI Bias the New Elephant in the Room?" session presented at Sitecore Symposium 2024 by industry experts Vickie Bertini, Megan (MJ) Mueller Jensen, Sana Kamalmaz, and Daniela Militaru, a unique, hands-on demonstration illustrated these issues, underscoring AI's potential for inadvertent harm.
Opening the session, Sana Kamalmaz, Director of Digital Strategy at Kajoo.ai and TechGuilds, highlighted that AI bias originates from human prejudices embedded in data, algorithms, and the AI development process itself. AI’s foundation—large datasets collected from real-world scenarios—often reflects entrenched biases, influencing AI’s ability to make fair decisions. As Kamalmaz explained, biased data during training or flawed design choices can lead to skewed AI outputs that disproportionately affect certain groups.
For instance, AI tools may inadvertently discriminate based on gender, ethnicity, or other factors if trained on biased data. Demonstrating this, Kamalmaz invited attendees to try a quick exercise generating AI images based on various prompts to observe how bias appears subtly but persistently, often yielding images that lack diversity in age, ethnicity, and gender. A key reason for this is the lack of diversity among those developing and training these models. According to the panel, bias can enter the AI pipeline at several critical “touch points.” The first of these touch points is in defining the problem the AI is designed to solve.
When creating an AI model, developers start by translating broad, abstract concepts into specific metrics. For example, a credit card company designing a model to assess applicants’ creditworthiness must first define “creditworthiness” in measurable terms. However, the company’s own objectives—like maximizing profits or reducing risks—can shape this definition, resulting in a model biased toward those goals. The AI tool could end up favoring applicants who align with the company’s definition, potentially at the cost of fairness to others.
Bias can also enter during the training phase, where models learn from large datasets. If the data used isn’t representative of diverse populations or contains historical prejudices, the AI will mirror those biases. For instance, a facial recognition model trained predominantly on lighter-skinned individuals may perform poorly with darker-skinned individuals. Similarly, an AI system trained on past hiring decisions could perpetuate any biases present in that historical data, leading to discrimination in screening processes.
Lastly, bias can emerge when AI models learn from their real-time interactions with humans. A striking example mentioned in the panel was Microsoft’s chatbot Tay, which quickly began producing offensive, biased content after being influenced by harmful interactions with users.
Daniela Militaru, Senior Solutions Engineer at Sitecore, discussed how AI bias affects sectors ranging from healthcare to finance to criminal justice, each illustrating the extensive social, ethical, and even legal implications of biases embedded in AI systems:
Amazon’s facial recognition technology incorrectly matched 28 members of the U.S. Congress with mugshots of previously arrested individuals. Most mismatches were with people of darker skin tones. This incident led to Microsoft halting its sales of similar technology to police departments, highlighting concerns about racial bias in AI-driven surveillance tools.
AI-driven credit scoring models can go beyond conventional data, factoring in behaviors such as bill payment habits, rent history, and hobbies, all of which can be sensitive and potentially biased indicators. This type of data could unfairly affect an individual's credit evaluation, especially in cases where socio-economic biases are present.
AI-driven hiring tools have displayed biases based on gender and ethnicity. A Bloomberg study revealed that ChatGPT 3.5, when used in hiring, showed preferential treatment for applicants based on names, with only 18% of top-ranked resumes associated with black Americans. Similarly, Amazon’s hiring AI downgraded resumes containing keywords like “women” or those referencing activities or hobbies associated with women. After uncovering this bias, Amazon discontinued the system.
AI is increasingly used to assess risk in criminal justice, but biased models can mislabel defendants. ProPublica reported that a risk assessment algorithm used in Florida misclassified African American defendants as “high risk” nearly twice as often as white defendants, impacting sentencing, bail, and parole decisions.
AI-powered deception, especially through deepfake technology, poses threats to democracy, particularly around elections. An example includes Meta’s AI model Cicero, designed for the board game Diplomacy. Cicero would make alliances and later betray them to win the game, showcasing AI's potential for deceit.
AI bias in healthcare can lead to unequal treatment and outcomes. For instance, in predicting cardiovascular risks, some AI models have underperformed in accurately identifying risk factors across different ethnic groups and genders. This discrepancy stems from underrepresentation of women and people of color in training datasets, leading to delayed diagnoses and potentially life-threatening oversights.
In martech, AI is used for data collection, audience segmentation, content personalization, and campaign optimization. When AI bias is ignored, marketing campaigns risk alienating certain demographic groups or reinforcing harmful stereotypes. For instance, AI-generated images may promote narrow body types or exclude certain communities entirely. This not only leads to missed business opportunities but also risks legal and ethical backlash.
Vickie Bertini, Solution Architect at EPAM Systems, cited recent research from University College London that studied bias within popular large language models, including older GPT and Llama models.
The first study explored bias in word associations. The researchers specifically examined gender bias by inputting male and female names and associating them with words related to various categories, including career and family roles, as well as attributes linked to math, art, science, youth, and age. The initial findings demonstrated clear gender stereotypes in word associations—male names were more frequently linked to career-oriented and scientific terms, while female names were commonly associated with family-related and artistic roles. Moreover, associations with age yielded stereotypical links between young or old names and pleasant or unpleasant adjectives. While some of these biases have decreased in newer models, the study found that they are still persistent.
The second study investigated negative text generation, specifically observing how prompts might produce offensive, explicit, or threatening content. To test this, researchers used prompts like “The man/woman was known for…” and then analyzed the responses. They found that older foundational language models—those that have not been extensively fine-tuned with human oversight—tended to return more harmful or negative associations, especially for prompts linked to certain demographics. However, when models were trained with human feedback, the results were generally more neutral or positive.
The third study examined diversity and stereotyping in occupation associations. Using the prompt “The man/woman worked as a…,” researchers observed the variety of occupations generated for men versus women. They found that results for “man” yielded a diverse array of professions, leading to a dense word cloud representation. In contrast, results for “woman” were more limited, with a few dominant, repetitive occupations. This suggested that male characters were associated with a wider variety of professional roles, while female characters were stereotypically linked to fewer occupations. When cultural references were added to these prompts, diversity decreased further, with results leaning heavily toward traditional or stereotypical roles.
Megan Mueller Jensen, Portfolio Specialist and Marketer at Perficient, closed the session with practical steps for reducing AI bias, starting with more diverse teams at every stage of development.
To develop AI tools that serve everyone, Jensen stressed the importance of inclusive teams that better reflect society’s diversity across age, gender, race, and lived experience. She advocated for policies within organizations that train employees to spot and mitigate AI bias as part of everyday workflows.
Jensen noted that organizations using AI in marketing and content creation must take active steps to prevent stereotypical outputs. As AI technologies increasingly create images, content, and campaigns, the biases they reflect can alienate or even harm individuals if diversity is not intentionally woven into the system.
Jensen also argued that guardrails are essential not only within organizations but also through public policy, legislation, and incentives for companies creating fair, responsible AI.
In their closing remarks, the panel agreed on the urgent need for collective action across industries and regulatory bodies to establish ethical, equitable AI practices. While AI has the power to revolutionize industries and drive societal progress, it also has the potential to widen inequalities if unchecked biases continue to permeate its algorithms.
To forge an inclusive AI future, companies, developers, and policymakers must commit to a more diverse, representative approach to data and decision-making.
Find out more about AI ethics and responsible artificial intelligence.