Cargando…
An integrated pipeline for prediction of Clostridioides difficile infection
With the expansion of electronic health records(EHR)-linked genomic data comes the development of machine learning-enable models. There is a pressing need to develop robust pipelines to evaluate the performance of integrated models and minimize systemic bias. We developed a prediction model of sympt...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10545794/ https://www.ncbi.nlm.nih.gov/pubmed/37783691 http://dx.doi.org/10.1038/s41598-023-41753-7 |
_version_ | 1785114740006060032 |
---|---|
author | Li, Jiang Chaudhary, Durgesh Sharma, Vaibhav Sharma, Vishakha Avula, Venkatesh Ssentongo, Paddy Wolk, Donna M. Zand, Ramin Abedi, Vida |
author_facet | Li, Jiang Chaudhary, Durgesh Sharma, Vaibhav Sharma, Vishakha Avula, Venkatesh Ssentongo, Paddy Wolk, Donna M. Zand, Ramin Abedi, Vida |
author_sort | Li, Jiang |
collection | PubMed |
description | With the expansion of electronic health records(EHR)-linked genomic data comes the development of machine learning-enable models. There is a pressing need to develop robust pipelines to evaluate the performance of integrated models and minimize systemic bias. We developed a prediction model of symptomatic Clostridioides difficile infection(CDI) by integrating common EHR-based and genetic risk factors(rs2227306/IL8). Our pipeline includes (1) leveraging phenotyping algorithm to minimize temporal bias, (2) performing simulation studies to determine the predictive power in samples without genetic information, (3) propensity score matching to control for the confoundings, (4) selecting machine learning algorithms to capture complex feature interactions, (5) performing oversampling to address data imbalance, and (6) optimizing models and ensuring proper bias-variance trade-off. We evaluate the performance of prediction models of CDI when including common clinical risk factors and the benefit of incorporating genetic feature(s) into the models. We emphasize the importance of building a robust integrated pipeline to avoid systemic bias and thoroughly evaluating genetic features when integrated into the prediction models in the general population and subgroups. |
format | Online Article Text |
id | pubmed-10545794 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-105457942023-10-04 An integrated pipeline for prediction of Clostridioides difficile infection Li, Jiang Chaudhary, Durgesh Sharma, Vaibhav Sharma, Vishakha Avula, Venkatesh Ssentongo, Paddy Wolk, Donna M. Zand, Ramin Abedi, Vida Sci Rep Article With the expansion of electronic health records(EHR)-linked genomic data comes the development of machine learning-enable models. There is a pressing need to develop robust pipelines to evaluate the performance of integrated models and minimize systemic bias. We developed a prediction model of symptomatic Clostridioides difficile infection(CDI) by integrating common EHR-based and genetic risk factors(rs2227306/IL8). Our pipeline includes (1) leveraging phenotyping algorithm to minimize temporal bias, (2) performing simulation studies to determine the predictive power in samples without genetic information, (3) propensity score matching to control for the confoundings, (4) selecting machine learning algorithms to capture complex feature interactions, (5) performing oversampling to address data imbalance, and (6) optimizing models and ensuring proper bias-variance trade-off. We evaluate the performance of prediction models of CDI when including common clinical risk factors and the benefit of incorporating genetic feature(s) into the models. We emphasize the importance of building a robust integrated pipeline to avoid systemic bias and thoroughly evaluating genetic features when integrated into the prediction models in the general population and subgroups. Nature Publishing Group UK 2023-10-02 /pmc/articles/PMC10545794/ /pubmed/37783691 http://dx.doi.org/10.1038/s41598-023-41753-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Li, Jiang Chaudhary, Durgesh Sharma, Vaibhav Sharma, Vishakha Avula, Venkatesh Ssentongo, Paddy Wolk, Donna M. Zand, Ramin Abedi, Vida An integrated pipeline for prediction of Clostridioides difficile infection |
title | An integrated pipeline for prediction of Clostridioides difficile infection |
title_full | An integrated pipeline for prediction of Clostridioides difficile infection |
title_fullStr | An integrated pipeline for prediction of Clostridioides difficile infection |
title_full_unstemmed | An integrated pipeline for prediction of Clostridioides difficile infection |
title_short | An integrated pipeline for prediction of Clostridioides difficile infection |
title_sort | integrated pipeline for prediction of clostridioides difficile infection |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10545794/ https://www.ncbi.nlm.nih.gov/pubmed/37783691 http://dx.doi.org/10.1038/s41598-023-41753-7 |
work_keys_str_mv | AT lijiang anintegratedpipelineforpredictionofclostridioidesdifficileinfection AT chaudharydurgesh anintegratedpipelineforpredictionofclostridioidesdifficileinfection AT sharmavaibhav anintegratedpipelineforpredictionofclostridioidesdifficileinfection AT sharmavishakha anintegratedpipelineforpredictionofclostridioidesdifficileinfection AT avulavenkatesh anintegratedpipelineforpredictionofclostridioidesdifficileinfection AT ssentongopaddy anintegratedpipelineforpredictionofclostridioidesdifficileinfection AT wolkdonnam anintegratedpipelineforpredictionofclostridioidesdifficileinfection AT zandramin anintegratedpipelineforpredictionofclostridioidesdifficileinfection AT abedivida anintegratedpipelineforpredictionofclostridioidesdifficileinfection AT lijiang integratedpipelineforpredictionofclostridioidesdifficileinfection AT chaudharydurgesh integratedpipelineforpredictionofclostridioidesdifficileinfection AT sharmavaibhav integratedpipelineforpredictionofclostridioidesdifficileinfection AT sharmavishakha integratedpipelineforpredictionofclostridioidesdifficileinfection AT avulavenkatesh integratedpipelineforpredictionofclostridioidesdifficileinfection AT ssentongopaddy integratedpipelineforpredictionofclostridioidesdifficileinfection AT wolkdonnam integratedpipelineforpredictionofclostridioidesdifficileinfection AT zandramin integratedpipelineforpredictionofclostridioidesdifficileinfection AT abedivida integratedpipelineforpredictionofclostridioidesdifficileinfection |