Hello, science.

My lab investigates methods to identify, quantify, and mitigate sources of algorithmic bias and evidence of clinician bias baked into Natural Language Processing (NLP) models derived from Electronic Health Records (EHRs). I’m also interested in the best means to evaluate equitable performance of an algorithm between different institutions and between cohorts at the same institution as part of an effort to re-use models more often than we train new models. I have active projects on using NLP extractions from unstructured clinical notes to reduce documentation inequities, using NLP to monitor and measure pejorative language use in notes, and using NLP to improve the speed and efficiency of clinical trial enrollment. As Head of the NLP Core (a service center), I regularly apply state-of-the-art NLP and Machine Learning methods to extract social determinants of health (SDoH) and other factors from unstructured clinical notes. The NLP Core is also responsible for de-identifying unstructured clinical notes to protect patient privacy and reduce the risk of protected health information (PHI) leakage in the course of regular research. My work tends to be computationally technical in nature but I am also open to students with expertise in statistical methods, ethical/equity frameworks, or bias reduction.

Recent Publications

More

Risk modeling without increasing risk

Initial development of tools to identify child abuse and neglect in pediatric primary care

Algorithmic Bias in De-Identification Tools

Implicit bias: Measuring the impact of pejorative and laudative language by clinicians on language models

All Publications