Researchers at Mass General Brigham have found that finely tuned generative artificial intelligence models can extract social determinants of health data from doctors’ notes and the electronic health record system.
The peer-reviewed study from researchers at the Boston-based system highlight a potentially important use case for generative AI as providers work toward a more well-rounded understanding of the factors that affect a patient's health.
Read more: How 4 health systems plan to adopt generative AI in 2024
“The EHR has a huge amount of information and it's becoming quite untenable for clinicians to go through every single line of patients’ prior history, especially if they have been seeing that patient in the same health system for a long time,” said Dr. Danielle Bitterman, a faculty member in the Artificial Intelligence in Medicine Program at Mass General Brigham and author of the research. “I think this is an area where we can see a huge benefit from AI and helping doctors sift through their EHR.”
Bitterman and a research team manually reviewed 800 clinician notes from 770 patients with cancer who received radiotherapy at Brigham and Women’s Hospital. They tagged sentences that referred to a patient’s employment status, housing, transportation, parental status, relationships or social support.
Using the data, the researchers trained language models to identify social determinants of health information within the EHR. When testing their generative AI model on an additional 400 clinic notes, the researchers found the model could identify 93.8% of patients with adverse social determinants of health. Official diagnostic codes, by comparison, only include this information in 2% of cases.
“There is evidence that patients may have previously had challenges with transportation, housing…employment issues that limit their ability to get care,” Bitterman said. “It just tends to be very sparsely documented, if it is documented.”
The researchers compared their model against OpenAI’s GPT-4 model. The researchers fed GPT-4 fake, but similar, data since it’s not complaint with the Health Insurance Portability and Accountability Act. The fine-tuned model performed better and was less prone to bias than the ChatGPT models, Bitterman said.
“We’re not at a place where these large-language models are so good at handling clinical texts and understanding what we mean when we're asking about clinical questions and classifications,” Bitterman said. “We can’t yet rely on them without the additional effort [to fine tune them]."
Bitterman acknowledged the limited reach of all generative AI models in this area because social determinants of health affecting a patient may not be documented. Her team plans additional research on how bias is learned by generative AI models.
Results of the study were published Thursday in npj Digital Medicine.