Cargando…
Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models
The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results across studies. Here, we performed multiple modeling experiments integrating clin...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635256/ https://www.ncbi.nlm.nih.gov/pubmed/37961557 http://dx.doi.org/10.1101/2023.11.01.23297927 |
_version_ | 1785146313216622592 |
---|---|
author | Morley, Theodore J. Willimitis, Drew Ripperger, Michael Lee, Hyunjoon Han, Lide Zhou, Yu Kang, Jooeun Davis, Lea K. Smoller, Jordan W. Choi, Karmel W. Walsh, Colin G. Ruderfer, Douglas M. |
author_facet | Morley, Theodore J. Willimitis, Drew Ripperger, Michael Lee, Hyunjoon Han, Lide Zhou, Yu Kang, Jooeun Davis, Lea K. Smoller, Jordan W. Choi, Karmel W. Walsh, Colin G. Ruderfer, Douglas M. |
author_sort | Morley, Theodore J. |
collection | PubMed |
description | The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results across studies. Here, we performed multiple modeling experiments integrating clinical and demographic data from electronic health records (EHR) and genetic data to understand which decision points may affect performance. Clinical data in the form of structured diagnostic codes, medications, procedural codes, and demographics were extracted from two large independent health systems and polygenic risk scores (PRS) were generated across all patients with genetic data in the corresponding biobanks. Crohn’s disease was used as the model phenotype based on its substantial genetic component, established EHR-based definition, and sufficient prevalence for model training and testing. We investigated the impact of PRS integration method, as well as choices regarding training sample, model complexity, and performance metrics. Overall, our results show that including PRS resulted in higher performance by some metrics but the gain in performance was only robust when combined with demographic data alone. Improvements were inconsistent or negligible after including additional clinical information. The impact of genetic information on performance also varied by PRS integration method, with a small improvement in some cases from combining PRS with the output of a clinical model (late-fusion) compared to its inclusion an additional feature (early-fusion). The effects of other modeling decisions varied between institutions though performance increased with more compute-intensive models such as random forest. This work highlights the importance of considering methodological decision points in interpreting the impact on prediction performance when including PRS information in clinical models. |
format | Online Article Text |
id | pubmed-10635256 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-106352562023-11-13 Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models Morley, Theodore J. Willimitis, Drew Ripperger, Michael Lee, Hyunjoon Han, Lide Zhou, Yu Kang, Jooeun Davis, Lea K. Smoller, Jordan W. Choi, Karmel W. Walsh, Colin G. Ruderfer, Douglas M. medRxiv Article The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results across studies. Here, we performed multiple modeling experiments integrating clinical and demographic data from electronic health records (EHR) and genetic data to understand which decision points may affect performance. Clinical data in the form of structured diagnostic codes, medications, procedural codes, and demographics were extracted from two large independent health systems and polygenic risk scores (PRS) were generated across all patients with genetic data in the corresponding biobanks. Crohn’s disease was used as the model phenotype based on its substantial genetic component, established EHR-based definition, and sufficient prevalence for model training and testing. We investigated the impact of PRS integration method, as well as choices regarding training sample, model complexity, and performance metrics. Overall, our results show that including PRS resulted in higher performance by some metrics but the gain in performance was only robust when combined with demographic data alone. Improvements were inconsistent or negligible after including additional clinical information. The impact of genetic information on performance also varied by PRS integration method, with a small improvement in some cases from combining PRS with the output of a clinical model (late-fusion) compared to its inclusion an additional feature (early-fusion). The effects of other modeling decisions varied between institutions though performance increased with more compute-intensive models such as random forest. This work highlights the importance of considering methodological decision points in interpreting the impact on prediction performance when including PRS information in clinical models. Cold Spring Harbor Laboratory 2023-11-01 /pmc/articles/PMC10635256/ /pubmed/37961557 http://dx.doi.org/10.1101/2023.11.01.23297927 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Morley, Theodore J. Willimitis, Drew Ripperger, Michael Lee, Hyunjoon Han, Lide Zhou, Yu Kang, Jooeun Davis, Lea K. Smoller, Jordan W. Choi, Karmel W. Walsh, Colin G. Ruderfer, Douglas M. Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models |
title | Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models |
title_full | Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models |
title_fullStr | Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models |
title_full_unstemmed | Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models |
title_short | Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models |
title_sort | evaluating the impact of modeling choices on the performance of integrated genetic and clinical models |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635256/ https://www.ncbi.nlm.nih.gov/pubmed/37961557 http://dx.doi.org/10.1101/2023.11.01.23297927 |
work_keys_str_mv | AT morleytheodorej evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels AT willimitisdrew evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels AT rippergermichael evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels AT leehyunjoon evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels AT hanlide evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels AT zhouyu evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels AT kangjooeun evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels AT davisleak evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels AT smollerjordanw evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels AT choikarmelw evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels AT walshcoling evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels AT ruderferdouglasm evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels |