Cargando…

Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data

Current findings from genetic studies of complex human traits often do not explain a large proportion of the estimated variation of these traits due to genetic factors. This could be, in part, due to overly stringent significance thresholds in traditional statistical methods, such as linear and logi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Holzinger, Emily R., Szymczak, Silke, Malley, James, Pugh, Elizabeth W., Ling, Hua, Griffith, Sean, Zhang, Peng, Li, Qing, Cropp, Cheryl D., Bailey-Wilson, Joan E.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133476/ https://www.ncbi.nlm.nih.gov/pubmed/27980627 http://dx.doi.org/10.1186/s12919-016-0021-1

_version_	1782471269727338496
author	Holzinger, Emily R. Szymczak, Silke Malley, James Pugh, Elizabeth W. Ling, Hua Griffith, Sean Zhang, Peng Li, Qing Cropp, Cheryl D. Bailey-Wilson, Joan E.
author_facet	Holzinger, Emily R. Szymczak, Silke Malley, James Pugh, Elizabeth W. Ling, Hua Griffith, Sean Zhang, Peng Li, Qing Cropp, Cheryl D. Bailey-Wilson, Joan E.
author_sort	Holzinger, Emily R.
collection	PubMed
description	Current findings from genetic studies of complex human traits often do not explain a large proportion of the estimated variation of these traits due to genetic factors. This could be, in part, due to overly stringent significance thresholds in traditional statistical methods, such as linear and logistic regression. Machine learning methods, such as Random Forests (RF), are an alternative approach to identify potentially interesting variants. One major issue with these methods is that there is no clear way to distinguish between probable true hits and noise variables based on the importance metric calculated. To this end, we are developing a method called the Relative Recurrency Variable Importance Metric (r2VIM), a RF-based variable selection method. Here, we apply r2VIM to the unrelated Genetic Analysis Workshop 19 data with simulated systolic blood pressure as the phenotype. We compare the number of “true” functional variants identified by r2VIM with those identified by linear regression analyses that use a Bonferroni correction to calculate a significance threshold. Our results show that r2VIM performed comparably to linear regression. Our findings are proof-of-concept for r2VIM, as it identifies a similar number of functional and nonfunctional variants as a more commonly used technique when the optimal importance score threshold is used.
format	Online Article Text
id	pubmed-5133476
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-51334762016-12-15 Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data Holzinger, Emily R. Szymczak, Silke Malley, James Pugh, Elizabeth W. Ling, Hua Griffith, Sean Zhang, Peng Li, Qing Cropp, Cheryl D. Bailey-Wilson, Joan E. BMC Proc Proceedings Current findings from genetic studies of complex human traits often do not explain a large proportion of the estimated variation of these traits due to genetic factors. This could be, in part, due to overly stringent significance thresholds in traditional statistical methods, such as linear and logistic regression. Machine learning methods, such as Random Forests (RF), are an alternative approach to identify potentially interesting variants. One major issue with these methods is that there is no clear way to distinguish between probable true hits and noise variables based on the importance metric calculated. To this end, we are developing a method called the Relative Recurrency Variable Importance Metric (r2VIM), a RF-based variable selection method. Here, we apply r2VIM to the unrelated Genetic Analysis Workshop 19 data with simulated systolic blood pressure as the phenotype. We compare the number of “true” functional variants identified by r2VIM with those identified by linear regression analyses that use a Bonferroni correction to calculate a significance threshold. Our results show that r2VIM performed comparably to linear regression. Our findings are proof-of-concept for r2VIM, as it identifies a similar number of functional and nonfunctional variants as a more commonly used technique when the optimal importance score threshold is used. BioMed Central 2016-10-18 /pmc/articles/PMC5133476/ /pubmed/27980627 http://dx.doi.org/10.1186/s12919-016-0021-1 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Proceedings Holzinger, Emily R. Szymczak, Silke Malley, James Pugh, Elizabeth W. Ling, Hua Griffith, Sean Zhang, Peng Li, Qing Cropp, Cheryl D. Bailey-Wilson, Joan E. Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data
title	Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data
title_full	Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data
title_fullStr	Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data
title_full_unstemmed	Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data
title_short	Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data
title_sort	comparison of parametric and machine methods for variable selection in simulated genetic analysis workshop 19 data
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133476/ https://www.ncbi.nlm.nih.gov/pubmed/27980627 http://dx.doi.org/10.1186/s12919-016-0021-1
work_keys_str_mv	AT holzingeremilyr comparisonofparametricandmachinemethodsforvariableselectioninsimulatedgeneticanalysisworkshop19data AT szymczaksilke comparisonofparametricandmachinemethodsforvariableselectioninsimulatedgeneticanalysisworkshop19data AT malleyjames comparisonofparametricandmachinemethodsforvariableselectioninsimulatedgeneticanalysisworkshop19data AT pughelizabethw comparisonofparametricandmachinemethodsforvariableselectioninsimulatedgeneticanalysisworkshop19data AT linghua comparisonofparametricandmachinemethodsforvariableselectioninsimulatedgeneticanalysisworkshop19data AT griffithsean comparisonofparametricandmachinemethodsforvariableselectioninsimulatedgeneticanalysisworkshop19data AT zhangpeng comparisonofparametricandmachinemethodsforvariableselectioninsimulatedgeneticanalysisworkshop19data AT liqing comparisonofparametricandmachinemethodsforvariableselectioninsimulatedgeneticanalysisworkshop19data AT croppcheryld comparisonofparametricandmachinemethodsforvariableselectioninsimulatedgeneticanalysisworkshop19data AT baileywilsonjoane comparisonofparametricandmachinemethodsforvariableselectioninsimulatedgeneticanalysisworkshop19data

Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data

Ejemplares similares