Cargando…

Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models

The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results across studies. Here, we performed multiple modeling experiments integrating clin...

Descripción completa

Detalles Bibliográficos
Autores principales: Morley, Theodore J., Willimitis, Drew, Ripperger, Michael, Lee, Hyunjoon, Han, Lide, Zhou, Yu, Kang, Jooeun, Davis, Lea K., Smoller, Jordan W., Choi, Karmel W., Walsh, Colin G., Ruderfer, Douglas M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635256/
https://www.ncbi.nlm.nih.gov/pubmed/37961557
http://dx.doi.org/10.1101/2023.11.01.23297927
_version_ 1785146313216622592
author Morley, Theodore J.
Willimitis, Drew
Ripperger, Michael
Lee, Hyunjoon
Han, Lide
Zhou, Yu
Kang, Jooeun
Davis, Lea K.
Smoller, Jordan W.
Choi, Karmel W.
Walsh, Colin G.
Ruderfer, Douglas M.
author_facet Morley, Theodore J.
Willimitis, Drew
Ripperger, Michael
Lee, Hyunjoon
Han, Lide
Zhou, Yu
Kang, Jooeun
Davis, Lea K.
Smoller, Jordan W.
Choi, Karmel W.
Walsh, Colin G.
Ruderfer, Douglas M.
author_sort Morley, Theodore J.
collection PubMed
description The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results across studies. Here, we performed multiple modeling experiments integrating clinical and demographic data from electronic health records (EHR) and genetic data to understand which decision points may affect performance. Clinical data in the form of structured diagnostic codes, medications, procedural codes, and demographics were extracted from two large independent health systems and polygenic risk scores (PRS) were generated across all patients with genetic data in the corresponding biobanks. Crohn’s disease was used as the model phenotype based on its substantial genetic component, established EHR-based definition, and sufficient prevalence for model training and testing. We investigated the impact of PRS integration method, as well as choices regarding training sample, model complexity, and performance metrics. Overall, our results show that including PRS resulted in higher performance by some metrics but the gain in performance was only robust when combined with demographic data alone. Improvements were inconsistent or negligible after including additional clinical information. The impact of genetic information on performance also varied by PRS integration method, with a small improvement in some cases from combining PRS with the output of a clinical model (late-fusion) compared to its inclusion an additional feature (early-fusion). The effects of other modeling decisions varied between institutions though performance increased with more compute-intensive models such as random forest. This work highlights the importance of considering methodological decision points in interpreting the impact on prediction performance when including PRS information in clinical models.
format Online
Article
Text
id pubmed-10635256
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-106352562023-11-13 Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models Morley, Theodore J. Willimitis, Drew Ripperger, Michael Lee, Hyunjoon Han, Lide Zhou, Yu Kang, Jooeun Davis, Lea K. Smoller, Jordan W. Choi, Karmel W. Walsh, Colin G. Ruderfer, Douglas M. medRxiv Article The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results across studies. Here, we performed multiple modeling experiments integrating clinical and demographic data from electronic health records (EHR) and genetic data to understand which decision points may affect performance. Clinical data in the form of structured diagnostic codes, medications, procedural codes, and demographics were extracted from two large independent health systems and polygenic risk scores (PRS) were generated across all patients with genetic data in the corresponding biobanks. Crohn’s disease was used as the model phenotype based on its substantial genetic component, established EHR-based definition, and sufficient prevalence for model training and testing. We investigated the impact of PRS integration method, as well as choices regarding training sample, model complexity, and performance metrics. Overall, our results show that including PRS resulted in higher performance by some metrics but the gain in performance was only robust when combined with demographic data alone. Improvements were inconsistent or negligible after including additional clinical information. The impact of genetic information on performance also varied by PRS integration method, with a small improvement in some cases from combining PRS with the output of a clinical model (late-fusion) compared to its inclusion an additional feature (early-fusion). The effects of other modeling decisions varied between institutions though performance increased with more compute-intensive models such as random forest. This work highlights the importance of considering methodological decision points in interpreting the impact on prediction performance when including PRS information in clinical models. Cold Spring Harbor Laboratory 2023-11-01 /pmc/articles/PMC10635256/ /pubmed/37961557 http://dx.doi.org/10.1101/2023.11.01.23297927 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Morley, Theodore J.
Willimitis, Drew
Ripperger, Michael
Lee, Hyunjoon
Han, Lide
Zhou, Yu
Kang, Jooeun
Davis, Lea K.
Smoller, Jordan W.
Choi, Karmel W.
Walsh, Colin G.
Ruderfer, Douglas M.
Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models
title Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models
title_full Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models
title_fullStr Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models
title_full_unstemmed Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models
title_short Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models
title_sort evaluating the impact of modeling choices on the performance of integrated genetic and clinical models
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635256/
https://www.ncbi.nlm.nih.gov/pubmed/37961557
http://dx.doi.org/10.1101/2023.11.01.23297927
work_keys_str_mv AT morleytheodorej evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels
AT willimitisdrew evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels
AT rippergermichael evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels
AT leehyunjoon evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels
AT hanlide evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels
AT zhouyu evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels
AT kangjooeun evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels
AT davisleak evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels
AT smollerjordanw evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels
AT choikarmelw evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels
AT walshcoling evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels
AT ruderferdouglasm evaluatingtheimpactofmodelingchoicesontheperformanceofintegratedgeneticandclinicalmodels