Cargando…
Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth
BACKGROUND: Identifying pregnancies at risk for preterm birth, one of the leading causes of worldwide infant mortality, has the potential to improve prenatal care. However, we lack broadly applicable methods to accurately predict preterm birth risk. The dense longitudinal information present in elec...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9516830/ https://www.ncbi.nlm.nih.gov/pubmed/36167547 http://dx.doi.org/10.1186/s12916-022-02522-x |
_version_ | 1784798788861296640 |
---|---|
author | Abraham, Abin Le, Brian Kosti, Idit Straub, Peter Velez-Edwards, Digna R. Davis, Lea K. Newton, J. M. Muglia, Louis J. Rokas, Antonis Bejan, Cosmin A. Sirota, Marina Capra, John A. |
author_facet | Abraham, Abin Le, Brian Kosti, Idit Straub, Peter Velez-Edwards, Digna R. Davis, Lea K. Newton, J. M. Muglia, Louis J. Rokas, Antonis Bejan, Cosmin A. Sirota, Marina Capra, John A. |
author_sort | Abraham, Abin |
collection | PubMed |
description | BACKGROUND: Identifying pregnancies at risk for preterm birth, one of the leading causes of worldwide infant mortality, has the potential to improve prenatal care. However, we lack broadly applicable methods to accurately predict preterm birth risk. The dense longitudinal information present in electronic health records (EHRs) is enabling scalable and cost-efficient risk modeling of many diseases, but EHR resources have been largely untapped in the study of pregnancy. METHODS: Here, we apply machine learning to diverse data from EHRs with 35,282 deliveries to predict singleton preterm birth. RESULTS: We find that machine learning models based on billing codes alone can predict preterm birth risk at various gestational ages (e.g., ROC-AUC = 0.75, PR-AUC = 0.40 at 28 weeks of gestation) and outperform comparable models trained using known risk factors (e.g., ROC-AUC = 0.65, PR-AUC = 0.25 at 28 weeks). Examining the patterns learned by the model reveals it stratifies deliveries into interpretable groups, including high-risk preterm birth subtypes enriched for distinct comorbidities. Our machine learning approach also predicts preterm birth subtypes (spontaneous vs. indicated), mode of delivery, and recurrent preterm birth. Finally, we demonstrate the portability of our approach by showing that the prediction models maintain their accuracy on a large, independent cohort (5978 deliveries) from a different healthcare system. CONCLUSIONS: By leveraging rich phenotypic and genetic features derived from EHRs, we suggest that machine learning algorithms have great potential to improve medical care during pregnancy. However, further work is needed before these models can be applied in clinical settings. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12916-022-02522-x. |
format | Online Article Text |
id | pubmed-9516830 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-95168302022-09-29 Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth Abraham, Abin Le, Brian Kosti, Idit Straub, Peter Velez-Edwards, Digna R. Davis, Lea K. Newton, J. M. Muglia, Louis J. Rokas, Antonis Bejan, Cosmin A. Sirota, Marina Capra, John A. BMC Med Research Article BACKGROUND: Identifying pregnancies at risk for preterm birth, one of the leading causes of worldwide infant mortality, has the potential to improve prenatal care. However, we lack broadly applicable methods to accurately predict preterm birth risk. The dense longitudinal information present in electronic health records (EHRs) is enabling scalable and cost-efficient risk modeling of many diseases, but EHR resources have been largely untapped in the study of pregnancy. METHODS: Here, we apply machine learning to diverse data from EHRs with 35,282 deliveries to predict singleton preterm birth. RESULTS: We find that machine learning models based on billing codes alone can predict preterm birth risk at various gestational ages (e.g., ROC-AUC = 0.75, PR-AUC = 0.40 at 28 weeks of gestation) and outperform comparable models trained using known risk factors (e.g., ROC-AUC = 0.65, PR-AUC = 0.25 at 28 weeks). Examining the patterns learned by the model reveals it stratifies deliveries into interpretable groups, including high-risk preterm birth subtypes enriched for distinct comorbidities. Our machine learning approach also predicts preterm birth subtypes (spontaneous vs. indicated), mode of delivery, and recurrent preterm birth. Finally, we demonstrate the portability of our approach by showing that the prediction models maintain their accuracy on a large, independent cohort (5978 deliveries) from a different healthcare system. CONCLUSIONS: By leveraging rich phenotypic and genetic features derived from EHRs, we suggest that machine learning algorithms have great potential to improve medical care during pregnancy. However, further work is needed before these models can be applied in clinical settings. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12916-022-02522-x. BioMed Central 2022-09-28 /pmc/articles/PMC9516830/ /pubmed/36167547 http://dx.doi.org/10.1186/s12916-022-02522-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Abraham, Abin Le, Brian Kosti, Idit Straub, Peter Velez-Edwards, Digna R. Davis, Lea K. Newton, J. M. Muglia, Louis J. Rokas, Antonis Bejan, Cosmin A. Sirota, Marina Capra, John A. Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth |
title | Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth |
title_full | Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth |
title_fullStr | Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth |
title_full_unstemmed | Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth |
title_short | Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth |
title_sort | dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9516830/ https://www.ncbi.nlm.nih.gov/pubmed/36167547 http://dx.doi.org/10.1186/s12916-022-02522-x |
work_keys_str_mv | AT abrahamabin densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth AT lebrian densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth AT kostiidit densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth AT straubpeter densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth AT velezedwardsdignar densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth AT davisleak densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth AT newtonjm densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth AT muglialouisj densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth AT rokasantonis densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth AT bejancosmina densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth AT sirotamarina densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth AT caprajohna densephenotypingfromelectronichealthrecordsenablesmachinelearningbasedpredictionofpretermbirth |