“Think about equity issues at the very, very beginning,” Lin said. “If you think about this once the technology is fully fleshed out … it’s often very difficult to go back and ‘tweak’ something.”
One of the principles Lin said he follows to tackle potential bias at the start of a project is ensuring there are diverse stakeholders weighing in on the design of the AI tool, as well as how it’s deployed. That means including developers with diverse backgrounds, as well as perspectives of those who will be affected by an AI rollout—like clinicians and patients.
HEA3RT recently worked on a project testing an AI chatbot that could collect a patient’s medical history before an appointment.
While some patients responded well to the chatbot, others said they wouldn’t feel as comfortable giving sensitive health data to a machine, according to Lin. Generally, younger and healthier patients tend to be more comfortable conversing with a chatbot, compared with older patients who had multiple or more complex chronic conditions, he added.
If a chatbot like this was rolled out to patients, it would also be important to make sure it could interact with patients who aren’t fluent in English.
To ensure ethical considerations like equity are thought about from the start, Mount Sinai Health System in New York City is building an AI ethics framework led by bioethics experts. Bioethicists have researched health disparities and bias for decades, said Thomas Fuchs, dean of AI and human health at the Icahn School of Medicine at Mount Sinai.
The framework will use the WHO’s ethics and governance report as a foundation.
“AI brings new challenges,” Fuchs said. “But very often, it also falls into categories that have already been addressed by previous ethics approaches in medicine.”
Pinpoint the right outcome to predict
Independence Blue Cross, a health insurer in Philadelphia, develops most of its AI tools in-house, so it’s important to be aware of the potential for bias from start to finish, said Aaron Smith-McLallen, the payer’s director of data science and healthcare analytics.
Since 2019, Independence Blue Cross has been working with the Center for Applied AI at the University of Chicago Booth School of Business. The center provides free feedback and support to healthcare providers, payers and technology companies that are interested in auditing specific algorithms or setting up processes to identify and mitigate algorithmic bias.
Working with the Center for Applied AI has helped data scientists at Independence Blue Cross systematize how they think about bias and where to add in checks and balances, such as tracking what types of patients an algorithm tends to flag, and whether that matches up to what’s expected, as well as what the implications of a false positive or negative could be.
As developers move through the stages of creating an algorithm, it’s integral to continuously ask “why are we doing this?” Smith-McLallen said. That answer should inform what outcomes an algorithm predicts.
Many of the algorithms used at Independence Blue Cross flag members who could benefit from outreach or care management. To get to that outcome, the algorithms predict which members are at risk for poor health outcomes.
That’s been a major takeaway that the Center for Applied AI has learned from working with healthcare organizations: the need to carefully think through what outcomes an algorithm predicts.
Algorithms that use proxies, or variables that approximate other outcomes, to reach their conclusions are at high-risk for unintentionally adding in biases, said Dr. Ziad Obermeyer, an associate professor in health policy and management at the University of California at Berkeley and head of health and AI research at the
Center for Applied AI.
The center launched in 2019 in the wake of a study that staff, including Obermeyer, published that found that a widely used algorithm for population health management—a predictive model that doesn’t use AI—dramatically underestimated the health needs of the sickest Black patients and assigned healthier white patients the same risk score as Black patients who had poorer lab results.
The algorithm flagged patients who could benefit from additional care-management services—but rather than predicting patients’ future health conditions, it predicted how much patients would cost the hospital. That created a disparity, since Black patients generally use healthcare services at lower rates than white patients.
Developers need to be “very, very careful and deliberate about choosing the exact variable that they’re predicting with an algorithm,” Obermeyer said.
It’s not always possible to predict exactly what an organization wants, especially with problems as complex as medical care. But keeping track of the information an organization would ideally want from an algorithm, what an algorithm’s actually doing—and then how those two things compare—can help to ensure the algorithm matches the “strategic purpose,” if not the exact variable.
Another common challenge is not acknowledging various root causes that contribute to a predicted outcome.
There are many algorithms that predict “no-shows” in primary care, which staff might use to double-book appointments, Obermeyer said as an example. But while some of those patients are likely voluntary no-shows, who cancel appointments because their symptoms go away, others are patients who struggle with getting to the clinic because they lack transportation or can’t get time off work.
“When an algorithm is just predicting who’s going to no-show, it’s confusing those two things,” Obermeyer said.
Once a health system has an AI tool, even one that’s validated and accurate, the work isn’t done there. Executives have to think critically about how to actually deploy the tool into care and use the insights that the AI draws out.
For an algorithm predicting no-shows, for example, developers might create a way to tease out voluntary and involuntary no-shows and handle those two situations in different ways.