Cargando…

Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models

Genotype-to-phenotype prediction is a central problem of human genetics. In recent years, it has become possible to construct complex predictive models for phenotypes, thanks to the availability of large genome data sets as well as efficient and scalable machine learning tools. In this paper, we mak...

Descripción completa

Detalles Bibliográficos
Autores principales: Medvedev, Aleksandr, Mishra Sharma, Satyarth, Tsatsorin, Evgenii, Nabieva, Elena, Yarotsky, Dmitry
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9432766/
https://www.ncbi.nlm.nih.gov/pubmed/36044406
http://dx.doi.org/10.1371/journal.pone.0273293
_version_ 1784780461409566720
author Medvedev, Aleksandr
Mishra Sharma, Satyarth
Tsatsorin, Evgenii
Nabieva, Elena
Yarotsky, Dmitry
author_facet Medvedev, Aleksandr
Mishra Sharma, Satyarth
Tsatsorin, Evgenii
Nabieva, Elena
Yarotsky, Dmitry
author_sort Medvedev, Aleksandr
collection PubMed
description Genotype-to-phenotype prediction is a central problem of human genetics. In recent years, it has become possible to construct complex predictive models for phenotypes, thanks to the availability of large genome data sets as well as efficient and scalable machine learning tools. In this paper, we make a threefold contribution to this problem. First, we ask if state-of-the-art nonlinear predictive models, such as boosted decision trees, can be more efficient for phenotype prediction than conventional linear models. We find that this is indeed the case if model features include a sufficiently rich set of covariates, but probably not otherwise. Second, we ask if the conventional selection of single nucleotide polymorphisms (SNPs) by genome wide association studies (GWAS) can be replaced by a more efficient procedure, taking into account information in previously selected SNPs. We propose such a procedure, based on a sequential feature importance estimation with decision trees, and show that this approach indeed produced informative SNP sets that are much more compact than when selected with GWAS. Finally, we show that the highest prediction accuracy can ultimately be achieved by ensembling individual linear and nonlinear models. To the best of our knowledge, for some of the phenotypes that we consider (asthma, hypothyroidism), our results are a new state-of-the-art.
format Online
Article
Text
id pubmed-9432766
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-94327662022-09-01 Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models Medvedev, Aleksandr Mishra Sharma, Satyarth Tsatsorin, Evgenii Nabieva, Elena Yarotsky, Dmitry PLoS One Research Article Genotype-to-phenotype prediction is a central problem of human genetics. In recent years, it has become possible to construct complex predictive models for phenotypes, thanks to the availability of large genome data sets as well as efficient and scalable machine learning tools. In this paper, we make a threefold contribution to this problem. First, we ask if state-of-the-art nonlinear predictive models, such as boosted decision trees, can be more efficient for phenotype prediction than conventional linear models. We find that this is indeed the case if model features include a sufficiently rich set of covariates, but probably not otherwise. Second, we ask if the conventional selection of single nucleotide polymorphisms (SNPs) by genome wide association studies (GWAS) can be replaced by a more efficient procedure, taking into account information in previously selected SNPs. We propose such a procedure, based on a sequential feature importance estimation with decision trees, and show that this approach indeed produced informative SNP sets that are much more compact than when selected with GWAS. Finally, we show that the highest prediction accuracy can ultimately be achieved by ensembling individual linear and nonlinear models. To the best of our knowledge, for some of the phenotypes that we consider (asthma, hypothyroidism), our results are a new state-of-the-art. Public Library of Science 2022-08-31 /pmc/articles/PMC9432766/ /pubmed/36044406 http://dx.doi.org/10.1371/journal.pone.0273293 Text en © 2022 Medvedev et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Medvedev, Aleksandr
Mishra Sharma, Satyarth
Tsatsorin, Evgenii
Nabieva, Elena
Yarotsky, Dmitry
Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models
title Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models
title_full Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models
title_fullStr Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models
title_full_unstemmed Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models
title_short Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models
title_sort human genotype-to-phenotype predictions: boosting accuracy with nonlinear models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9432766/
https://www.ncbi.nlm.nih.gov/pubmed/36044406
http://dx.doi.org/10.1371/journal.pone.0273293
work_keys_str_mv AT medvedevaleksandr humangenotypetophenotypepredictionsboostingaccuracywithnonlinearmodels
AT mishrasharmasatyarth humangenotypetophenotypepredictionsboostingaccuracywithnonlinearmodels
AT tsatsorinevgenii humangenotypetophenotypepredictionsboostingaccuracywithnonlinearmodels
AT nabievaelena humangenotypetophenotypepredictionsboostingaccuracywithnonlinearmodels
AT yarotskydmitry humangenotypetophenotypepredictionsboostingaccuracywithnonlinearmodels