Natural language processing to extract social determinants of health

Information about the nonmedical factors that influence health outcomes, called the social determinants of health, is often collected at physician appointments. However, this information is often recorded as text in the clinical notes written by physicians, nurses, social workers, and therapists.

Researchers at Indiana University’s Regenstrief Institute and Fairbanks School of Public Health recently published one of the first studies to apply natural language processing to social determinants of health. Researchers developed three new natural language processing algorithms to successfully extract information from text data on housing issues, financial stability, and employment status from electronic health records.

“Health and well-being is not just about medical care. It’s all about our behavior, our environment, our social connections,” said Joshua Vest, PhD, a research scientist at the Regenstrief Institute and a faculty member at the Fairbanks School of Public Health who led the study. “More and more healthcare organizations are having to address social determinants because it’s factors like financial resources, housing, and employment status that really drive up costs and make people ill. The challenge for healthcare organizations is to effectively measure and identify patients at social risk so they can intervene.

“Our work contributes to advancing the field both in terms of application and methodology. Natural language processing has historically been applied to numerous diseases, but this is some of the first work to apply it to social determinants of health. We have shown that a relatively simplified natural language is used.” “The language processing approach could effectively measure social determinants instead of using more sophisticated deep learning and neural network models. These later models are powerful, but complex, difficult to implement, and require a great deal of expertise that many healthcare systems lack.”

We intentionally designed a system that runs in the background, reading all the notes and creating tags or indicators to indicate that the patient record contains data that indicates possible concerns about a health-related social indicator. Our overarching goal is to measure social determinants well enough for researchers to develop risk models and for physicians and health systems to use these factors—housing issues, financial security, and employment status—in routine practice to help individuals and create a better quality of life understanding of general characteristics and needs of their patient population.”

Joshua Vest, PhD, research scientist at Regenstrief Institute and faculty member at Fairbanks School of Public Health

Social needs information can be extracted for many types of data in an electronic medical record, including information about the patient’s occupation, health insurance coverage, marital status, household size, address (low or high crime area), and frequency of changes of address.

Previously, dr. Vest and colleagues, including Dr. Shaun Grannis, vice president of data and analytics at the Regenstrief Institute, developed an app called Uppstroms, Swedish for “upstream,” and successfully demonstrated that it can use structured data to predict patients who need referral to a social service like a nutritionist.


Magazine reference:

Allen, KS et al. (2023) State machines driven by natural language processing for extracting social factors from unstructured clinical documentation. JAMIA Open.

Leave a Comment