Cargando…

Evaluation of tree-based statistical learning methods for constructing genetic risk scores

BACKGROUND: Genetic risk scores (GRS) summarize genetic features such as single nucleotide polymorphisms (SNPs) in a single statistic with respect to a given trait. So far, GRS are typically built using generalized linear models or regularized extensions. However, these linear methods are usually no...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lau, Michael, Wigmann, Claudia, Kress, Sara, Schikowski, Tamara, Schwender, Holger
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8935722/ https://www.ncbi.nlm.nih.gov/pubmed/35313824 http://dx.doi.org/10.1186/s12859-022-04634-w

_version_	1784672088597987328
author	Lau, Michael Wigmann, Claudia Kress, Sara Schikowski, Tamara Schwender, Holger
author_facet	Lau, Michael Wigmann, Claudia Kress, Sara Schikowski, Tamara Schwender, Holger
author_sort	Lau, Michael
collection	PubMed
description	BACKGROUND: Genetic risk scores (GRS) summarize genetic features such as single nucleotide polymorphisms (SNPs) in a single statistic with respect to a given trait. So far, GRS are typically built using generalized linear models or regularized extensions. However, these linear methods are usually not able to incorporate gene-gene interactions or non-linear SNP-response relationships. Tree-based statistical learning methods such as random forests and logic regression may be an alternative to such regularized-regression-based methods and are investigated in this article. Moreover, we consider modifications of random forests and logic regression for the construction of GRS. RESULTS: In an extensive simulation study and an application to a real data set from a German cohort study, we show that both tree-based approaches can outperform elastic net when constructing GRS for binary traits. Especially a modification of logic regression called logic bagging could induce comparatively high predictive power as measured by the area under the curve and the statistical power. Even when considering no epistatic interaction effects but only marginal genetic effects, the regularized regression method lead in most cases to inferior results. CONCLUSIONS: When constructing GRS, we recommend taking random forests and logic bagging into account, in particular, if it can be assumed that possibly unknown epistasis between SNPs is present. To develop the best possible prediction models, extensive joint hyperparameter optimizations should be conducted. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04634-w.
format	Online Article Text
id	pubmed-8935722
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-89357222022-03-23 Evaluation of tree-based statistical learning methods for constructing genetic risk scores Lau, Michael Wigmann, Claudia Kress, Sara Schikowski, Tamara Schwender, Holger BMC Bioinformatics Research BACKGROUND: Genetic risk scores (GRS) summarize genetic features such as single nucleotide polymorphisms (SNPs) in a single statistic with respect to a given trait. So far, GRS are typically built using generalized linear models or regularized extensions. However, these linear methods are usually not able to incorporate gene-gene interactions or non-linear SNP-response relationships. Tree-based statistical learning methods such as random forests and logic regression may be an alternative to such regularized-regression-based methods and are investigated in this article. Moreover, we consider modifications of random forests and logic regression for the construction of GRS. RESULTS: In an extensive simulation study and an application to a real data set from a German cohort study, we show that both tree-based approaches can outperform elastic net when constructing GRS for binary traits. Especially a modification of logic regression called logic bagging could induce comparatively high predictive power as measured by the area under the curve and the statistical power. Even when considering no epistatic interaction effects but only marginal genetic effects, the regularized regression method lead in most cases to inferior results. CONCLUSIONS: When constructing GRS, we recommend taking random forests and logic bagging into account, in particular, if it can be assumed that possibly unknown epistasis between SNPs is present. To develop the best possible prediction models, extensive joint hyperparameter optimizations should be conducted. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04634-w. BioMed Central 2022-03-21 /pmc/articles/PMC8935722/ /pubmed/35313824 http://dx.doi.org/10.1186/s12859-022-04634-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Lau, Michael Wigmann, Claudia Kress, Sara Schikowski, Tamara Schwender, Holger Evaluation of tree-based statistical learning methods for constructing genetic risk scores
title	Evaluation of tree-based statistical learning methods for constructing genetic risk scores
title_full	Evaluation of tree-based statistical learning methods for constructing genetic risk scores
title_fullStr	Evaluation of tree-based statistical learning methods for constructing genetic risk scores
title_full_unstemmed	Evaluation of tree-based statistical learning methods for constructing genetic risk scores
title_short	Evaluation of tree-based statistical learning methods for constructing genetic risk scores
title_sort	evaluation of tree-based statistical learning methods for constructing genetic risk scores
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8935722/ https://www.ncbi.nlm.nih.gov/pubmed/35313824 http://dx.doi.org/10.1186/s12859-022-04634-w
work_keys_str_mv	AT laumichael evaluationoftreebasedstatisticallearningmethodsforconstructinggeneticriskscores AT wigmannclaudia evaluationoftreebasedstatisticallearningmethodsforconstructinggeneticriskscores AT kresssara evaluationoftreebasedstatisticallearningmethodsforconstructinggeneticriskscores AT schikowskitamara evaluationoftreebasedstatisticallearningmethodsforconstructinggeneticriskscores AT schwenderholger evaluationoftreebasedstatisticallearningmethodsforconstructinggeneticriskscores

Evaluation of tree-based statistical learning methods for constructing genetic risk scores

Ejemplares similares