Cargando…

Picking single-nucleotide polymorphisms in forests

With the development of high-throughput single-nucleotide polymorphism (SNP) technologies, the vast number of SNPs in smaller samples poses a challenge to the application of classical statistical procedures. A possible solution is to use a two-stage approach for case-control data in which, in the fi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Schwarz, Daniel F, Szymczak, Silke, Ziegler, Andreas, König, Inke R
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367487/ https://www.ncbi.nlm.nih.gov/pubmed/18466559

_version_	1782154303592464384
author	Schwarz, Daniel F Szymczak, Silke Ziegler, Andreas König, Inke R
author_facet	Schwarz, Daniel F Szymczak, Silke Ziegler, Andreas König, Inke R
author_sort	Schwarz, Daniel F
collection	PubMed
description	With the development of high-throughput single-nucleotide polymorphism (SNP) technologies, the vast number of SNPs in smaller samples poses a challenge to the application of classical statistical procedures. A possible solution is to use a two-stage approach for case-control data in which, in the first stage, a screening test selects a small number of SNPs for further analysis. The second stage then estimates the effects of the selected variables using logistic regression (logReg). Here, we introduce a novel approach in which the selection of SNPs is based on the permutation importance estimated by random forests (RFs). For this, we used the simulated data provided for the Genetic Analysis Workshop 15 without knowledge of the true model. The data set was randomly split into a first and a second data set. In the first stage, RFs were grown to pre-select the 37 most important variables, and these were reduced to 32 variables by haplotype tagging. In the second stage, we estimated parameters using logReg. The highest effect estimates were obtained for five simulated loci. We detected smoking, gender, and the parental DR alleles as covariates. After correction for multiple testing, we identified two out of four genes simulated with a direct effect on rheumatoid arthritis risk and all covariates without any false positive. We showed that a two-staged approach with a screening of SNPs by RFs is suitable to detect candidate SNPs in genome-wide association studies for complex diseases.
format	Text
id	pubmed-2367487
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-23674872008-05-06 Picking single-nucleotide polymorphisms in forests Schwarz, Daniel F Szymczak, Silke Ziegler, Andreas König, Inke R BMC Proc Proceedings With the development of high-throughput single-nucleotide polymorphism (SNP) technologies, the vast number of SNPs in smaller samples poses a challenge to the application of classical statistical procedures. A possible solution is to use a two-stage approach for case-control data in which, in the first stage, a screening test selects a small number of SNPs for further analysis. The second stage then estimates the effects of the selected variables using logistic regression (logReg). Here, we introduce a novel approach in which the selection of SNPs is based on the permutation importance estimated by random forests (RFs). For this, we used the simulated data provided for the Genetic Analysis Workshop 15 without knowledge of the true model. The data set was randomly split into a first and a second data set. In the first stage, RFs were grown to pre-select the 37 most important variables, and these were reduced to 32 variables by haplotype tagging. In the second stage, we estimated parameters using logReg. The highest effect estimates were obtained for five simulated loci. We detected smoking, gender, and the parental DR alleles as covariates. After correction for multiple testing, we identified two out of four genes simulated with a direct effect on rheumatoid arthritis risk and all covariates without any false positive. We showed that a two-staged approach with a screening of SNPs by RFs is suitable to detect candidate SNPs in genome-wide association studies for complex diseases. BioMed Central 2007-12-18 /pmc/articles/PMC2367487/ /pubmed/18466559 Text en Copyright © 2007 Schwarz et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Schwarz, Daniel F Szymczak, Silke Ziegler, Andreas König, Inke R Picking single-nucleotide polymorphisms in forests
title	Picking single-nucleotide polymorphisms in forests
title_full	Picking single-nucleotide polymorphisms in forests
title_fullStr	Picking single-nucleotide polymorphisms in forests
title_full_unstemmed	Picking single-nucleotide polymorphisms in forests
title_short	Picking single-nucleotide polymorphisms in forests
title_sort	picking single-nucleotide polymorphisms in forests
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367487/ https://www.ncbi.nlm.nih.gov/pubmed/18466559
work_keys_str_mv	AT schwarzdanielf pickingsinglenucleotidepolymorphismsinforests AT szymczaksilke pickingsinglenucleotidepolymorphismsinforests AT zieglerandreas pickingsinglenucleotidepolymorphismsinforests AT koniginker pickingsinglenucleotidepolymorphismsinforests

Picking single-nucleotide polymorphisms in forests

Ejemplares similares