Cargando…

An integrated pipeline for prediction of Clostridioides difficile infection

With the expansion of electronic health records(EHR)-linked genomic data comes the development of machine learning-enable models. There is a pressing need to develop robust pipelines to evaluate the performance of integrated models and minimize systemic bias. We developed a prediction model of sympt...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Jiang, Chaudhary, Durgesh, Sharma, Vaibhav, Sharma, Vishakha, Avula, Venkatesh, Ssentongo, Paddy, Wolk, Donna M., Zand, Ramin, Abedi, Vida
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10545794/
https://www.ncbi.nlm.nih.gov/pubmed/37783691
http://dx.doi.org/10.1038/s41598-023-41753-7
_version_ 1785114740006060032
author Li, Jiang
Chaudhary, Durgesh
Sharma, Vaibhav
Sharma, Vishakha
Avula, Venkatesh
Ssentongo, Paddy
Wolk, Donna M.
Zand, Ramin
Abedi, Vida
author_facet Li, Jiang
Chaudhary, Durgesh
Sharma, Vaibhav
Sharma, Vishakha
Avula, Venkatesh
Ssentongo, Paddy
Wolk, Donna M.
Zand, Ramin
Abedi, Vida
author_sort Li, Jiang
collection PubMed
description With the expansion of electronic health records(EHR)-linked genomic data comes the development of machine learning-enable models. There is a pressing need to develop robust pipelines to evaluate the performance of integrated models and minimize systemic bias. We developed a prediction model of symptomatic Clostridioides difficile infection(CDI) by integrating common EHR-based and genetic risk factors(rs2227306/IL8). Our pipeline includes (1) leveraging phenotyping algorithm to minimize temporal bias, (2) performing simulation studies to determine the predictive power in samples without genetic information, (3) propensity score matching to control for the confoundings, (4) selecting machine learning algorithms to capture complex feature interactions, (5) performing oversampling to address data imbalance, and (6) optimizing models and ensuring proper bias-variance trade-off. We evaluate the performance of prediction models of CDI when including common clinical risk factors and the benefit of incorporating genetic feature(s) into the models. We emphasize the importance of building a robust integrated pipeline to avoid systemic bias and thoroughly evaluating genetic features when integrated into the prediction models in the general population and subgroups.
format Online
Article
Text
id pubmed-10545794
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-105457942023-10-04 An integrated pipeline for prediction of Clostridioides difficile infection Li, Jiang Chaudhary, Durgesh Sharma, Vaibhav Sharma, Vishakha Avula, Venkatesh Ssentongo, Paddy Wolk, Donna M. Zand, Ramin Abedi, Vida Sci Rep Article With the expansion of electronic health records(EHR)-linked genomic data comes the development of machine learning-enable models. There is a pressing need to develop robust pipelines to evaluate the performance of integrated models and minimize systemic bias. We developed a prediction model of symptomatic Clostridioides difficile infection(CDI) by integrating common EHR-based and genetic risk factors(rs2227306/IL8). Our pipeline includes (1) leveraging phenotyping algorithm to minimize temporal bias, (2) performing simulation studies to determine the predictive power in samples without genetic information, (3) propensity score matching to control for the confoundings, (4) selecting machine learning algorithms to capture complex feature interactions, (5) performing oversampling to address data imbalance, and (6) optimizing models and ensuring proper bias-variance trade-off. We evaluate the performance of prediction models of CDI when including common clinical risk factors and the benefit of incorporating genetic feature(s) into the models. We emphasize the importance of building a robust integrated pipeline to avoid systemic bias and thoroughly evaluating genetic features when integrated into the prediction models in the general population and subgroups. Nature Publishing Group UK 2023-10-02 /pmc/articles/PMC10545794/ /pubmed/37783691 http://dx.doi.org/10.1038/s41598-023-41753-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Li, Jiang
Chaudhary, Durgesh
Sharma, Vaibhav
Sharma, Vishakha
Avula, Venkatesh
Ssentongo, Paddy
Wolk, Donna M.
Zand, Ramin
Abedi, Vida
An integrated pipeline for prediction of Clostridioides difficile infection
title An integrated pipeline for prediction of Clostridioides difficile infection
title_full An integrated pipeline for prediction of Clostridioides difficile infection
title_fullStr An integrated pipeline for prediction of Clostridioides difficile infection
title_full_unstemmed An integrated pipeline for prediction of Clostridioides difficile infection
title_short An integrated pipeline for prediction of Clostridioides difficile infection
title_sort integrated pipeline for prediction of clostridioides difficile infection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10545794/
https://www.ncbi.nlm.nih.gov/pubmed/37783691
http://dx.doi.org/10.1038/s41598-023-41753-7
work_keys_str_mv AT lijiang anintegratedpipelineforpredictionofclostridioidesdifficileinfection
AT chaudharydurgesh anintegratedpipelineforpredictionofclostridioidesdifficileinfection
AT sharmavaibhav anintegratedpipelineforpredictionofclostridioidesdifficileinfection
AT sharmavishakha anintegratedpipelineforpredictionofclostridioidesdifficileinfection
AT avulavenkatesh anintegratedpipelineforpredictionofclostridioidesdifficileinfection
AT ssentongopaddy anintegratedpipelineforpredictionofclostridioidesdifficileinfection
AT wolkdonnam anintegratedpipelineforpredictionofclostridioidesdifficileinfection
AT zandramin anintegratedpipelineforpredictionofclostridioidesdifficileinfection
AT abedivida anintegratedpipelineforpredictionofclostridioidesdifficileinfection
AT lijiang integratedpipelineforpredictionofclostridioidesdifficileinfection
AT chaudharydurgesh integratedpipelineforpredictionofclostridioidesdifficileinfection
AT sharmavaibhav integratedpipelineforpredictionofclostridioidesdifficileinfection
AT sharmavishakha integratedpipelineforpredictionofclostridioidesdifficileinfection
AT avulavenkatesh integratedpipelineforpredictionofclostridioidesdifficileinfection
AT ssentongopaddy integratedpipelineforpredictionofclostridioidesdifficileinfection
AT wolkdonnam integratedpipelineforpredictionofclostridioidesdifficileinfection
AT zandramin integratedpipelineforpredictionofclostridioidesdifficileinfection
AT abedivida integratedpipelineforpredictionofclostridioidesdifficileinfection