HyperAIHyperAI
Back to Headlines

AI Medical Tools Worsen Care for Women and Minorities Due to Biased Training Data

4 days ago

AI-powered medical tools are increasingly showing troubling biases that lead to worse health outcomes for women and underrepresented racial and ethnic groups, according to a growing body of research. The root of the problem lies in decades of underrepresentation in clinical trials and medical research, where white men have historically been the primary subjects. As a result, the data used to train today’s artificial intelligence models often lacks diversity, embedding systemic biases into the algorithms that now assist doctors and healthcare providers. A recent report by the Financial Times highlights findings from a study conducted by researchers at the Massachusetts Institute of Technology, which revealed that large language models—including OpenAI’s GPT-4 and Meta’s Llama 3—were more likely to recommend less intensive care for female patients. Women were significantly more often advised to “self-manage at home” compared to men with similar symptoms, resulting in reduced access to clinical interventions. This trend persisted even in healthcare-specific models. The study found that Palmyra-Med, a medical-focused LLM, exhibited similar gender-based disparities in treatment recommendations. Further evidence comes from research by the London School of Economics on Google’s Gemma model, which showed that the AI tended to downplay women’s health needs relative to men’s. Another study published in The Lancet analyzed OpenAI’s GPT-4 and found it frequently relied on demographic stereotypes when making diagnoses and treatment plans. The model associated certain races, ethnicities, and genders with specific health conditions or procedures, leading to recommendations that were more influenced by identity than by actual symptoms. The report concluded that the model’s assessments were significantly linked to demographic attributes, including a higher likelihood of recommending more expensive procedures for certain groups. Compounding the issue, prior research has shown that AI models often fail to deliver equal levels of empathy and compassion when addressing mental health concerns for people of color compared to white patients. These biases are not just theoretical—they have real-world consequences. As companies like Google, Meta, and OpenAI push to integrate their AI tools into hospitals and clinics, the stakes rise dramatically. The risks are not limited to subtle biases. Earlier this year, Google’s Med-Gemini AI made headlines for fabricating a non-existent body part during a diagnostic simulation—a clear case of hallucination that a trained clinician could likely spot. But the more insidious dangers lie in the subtle, unconscious patterns that may go undetected. When an AI consistently recommends less aggressive treatment for women or underrepresents the severity of symptoms in marginalized patients, it may appear as a neutral or logical suggestion—until it leads to delayed care, misdiagnosis, or worse. The integration of AI into healthcare offers tremendous potential, but without rigorous attention to data diversity, bias testing, and human oversight, these tools risk entrenching and even amplifying long-standing inequities in medicine. As AI becomes more central to clinical decision-making, the medical community must act now to ensure these systems do not become another barrier to equitable care.

Related Links