Cargando…

Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth

BACKGROUND: Identifying pregnancies at risk for preterm birth, one of the leading causes of worldwide infant mortality, has the potential to improve prenatal care. However, we lack broadly applicable methods to accurately predict preterm birth risk. The dense longitudinal information present in elec...

Descripción completa

Detalles Bibliográficos
Autores principales: Abraham, Abin, Le, Brian, Kosti, Idit, Straub, Peter, Velez-Edwards, Digna R., Davis, Lea K., Newton, J. M., Muglia, Louis J., Rokas, Antonis, Bejan, Cosmin A., Sirota, Marina, Capra, John A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9516830/
https://www.ncbi.nlm.nih.gov/pubmed/36167547
http://dx.doi.org/10.1186/s12916-022-02522-x
_version_ 1784798788861296640
author Abraham, Abin
Le, Brian
Kosti, Idit
Straub, Peter
Velez-Edwards, Digna R.
Davis, Lea K.
Newton, J. M.
Muglia, Louis J.
Rokas, Antonis
Bejan, Cosmin A.
Sirota, Marina
Capra, John A.
author_facet Abraham, Abin
Le, Brian
Kosti, Idit
Straub, Peter
Velez-Edwards, Digna R.
Davis, Lea K.
Newton, J. M.
Muglia, Louis J.
Rokas, Antonis
Bejan, Cosmin A.
Sirota, Marina
Capra, John A.
author_sort Abraham, Abin
collection PubMed
description BACKGROUND: Identifying pregnancies at risk for preterm birth, one of the leading causes of worldwide infant mortality, has the potential to improve prenatal care. However, we lack broadly applicable methods to accurately predict preterm birth risk. The dense longitudinal information present in electronic health records (EHRs) is enabling scalable and cost-efficient risk modeling of many diseases, but EHR resources have been largely untapped in the study of pregnancy. METHODS: Here, we apply machine learning to diverse data from EHRs with 35,282 deliveries to predict singleton preterm birth. RESULTS: We find that machine learning models based on billing codes alone can predict preterm birth risk at various gestational ages (e.g., ROC-AUC = 0.75, PR-AUC = 0.40 at 28 weeks of gestation) and outperform comparable models trained using known risk factors (e.g., ROC-AUC = 0.65, PR-AUC = 0.25 at 28 weeks). Examining the patterns learned by the model reveals it stratifies deliveries into interpretable groups, including high-risk preterm birth subtypes enriched for distinct comorbidities. Our machine learning approach also predicts preterm birth subtypes (spontaneous vs. indicated), mode of delivery, and recurrent preterm birth. Finally, we demonstrate the portability of our approach by showing that the prediction models maintain their accuracy on a large, independent cohort (5978 deliveries) from a different healthcare system. CONCLUSIONS: By leveraging rich phenotypic and genetic features derived from EHRs, we suggest that machine learning algorithms have great potential to improve medical care during pregnancy. However, further work is needed before these models can be applied in clinical settings. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12916-022-02522-x.
format Online
Article
Text
id pubmed-9516830
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-95168302022-09-29 Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth Abraham, Abin Le, Brian Kosti, Idit Straub, Peter Velez-Edwards, Digna R. Davis, Lea K. Newton, J. M. Muglia, Louis J. Rokas, Antonis Bejan, Cosmin A. Sirota, Marina Capra, John A. BMC Med Research Article BACKGROUND: Identifying pregnancies at risk for preterm birth, one of the leading causes of worldwide infant mortality, has the potential to improve prenatal care. However, we lack broadly applicable methods to accurately predict preterm birth risk. The dense longitudinal information present in electronic health records (EHRs) is enabling scalable and cost-efficient risk modeling of many diseases, but EHR resources have been largely untapped in the study of pregnancy. METHODS: Here, we apply machine learning to diverse data from EHRs with 35,282 deliveries to predict singleton preterm birth. RESULTS: We find that machine learning models based on billing codes alone can predict preterm birth risk at various gestational ages (e.g., ROC-AUC = 0.75, PR-AUC = 0.40 at 28 weeks of gestation) and outperform comparable models trained using known risk factors (e.g., ROC-AUC = 0.65, PR-AUC = 0.25 at 28 weeks). Examining the patterns learned by the model reveals it stratifies deliveries into interpretable groups, including high-risk preterm birth subtypes enriched for distinct comorbidities. Our machine learning approach also predicts preterm birth subtypes (spontaneous vs. indicated), mode of delivery, and recurrent preterm birth. Finally, we demonstrate the portability of our approach by showing that the prediction models maintain their accuracy on a large, independent cohort (5978 deliveries) from a different healthcare system. CONCLUSIONS: By leveraging rich phenotypic and genetic features derived from EHRs, we suggest that machine learning algorithms have great potential to improve medical care during pregnancy. However, further work is needed before these models can be applied in clinical settings. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12916-022-02522-x. BioMed Central 2022-09-28 /pmc/articles/PMC9516830/ /pubmed/36167547 http://dx.doi.org/10.1186/s12916-022-02522-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Abraham, Abin
Le, Brian
Kosti, Idit
Straub, Peter
Velez-Edwards, Digna R.
Davis, Lea K.
Newton, J. M.
Muglia, Louis J.
Rokas, Antonis
Bejan, Cosmin A.
Sirota, Marina
Capra, John A.
Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth
title Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth
title_full Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth
title_fullStr Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth
title_full_unstemmed Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth
title_short Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth
title_sort dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9516830/
https://www.ncbi.nlm.nih.gov/pubmed/36167547
http://dx.doi.org/10.1186/s12916-022-02522-x
work_keys_str_mv AT abrahamabin densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth
AT lebrian densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth
AT kostiidit densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth
AT straubpeter densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth
AT velezedwardsdignar densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth
AT davisleak densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth
AT newtonjm densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth
AT muglialouisj densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth
AT rokasantonis densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth
AT bejancosmina densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth
AT sirotamarina densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth
AT caprajohna densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth