Cargando…

Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest

We study the number of causal variants and associated regions identified by top SNPs in rankings given by the popular 1 df chi-squared statistic, support vector machine (SVM) and the random forest (RF) on simulated and real data. If we apply the SVM and RF to the top 2r chi-square-ranked SNPs, where...

Descripción completa

Detalles Bibliográficos
Autores principales: Roshan, Usman, Chikkagoudar, Satish, Wei, Zhi, Wang, Kai, Hakonarson, Hakon
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3089490/
https://www.ncbi.nlm.nih.gov/pubmed/21317188
http://dx.doi.org/10.1093/nar/gkr064
_version_ 1782203055318499328
author Roshan, Usman
Chikkagoudar, Satish
Wei, Zhi
Wang, Kai
Hakonarson, Hakon
author_facet Roshan, Usman
Chikkagoudar, Satish
Wei, Zhi
Wang, Kai
Hakonarson, Hakon
author_sort Roshan, Usman
collection PubMed
description We study the number of causal variants and associated regions identified by top SNPs in rankings given by the popular 1 df chi-squared statistic, support vector machine (SVM) and the random forest (RF) on simulated and real data. If we apply the SVM and RF to the top 2r chi-square-ranked SNPs, where r is the number of SNPs with P-values within the Bonferroni correction, we find that both improve the ranks of causal variants and associated regions and achieve higher power on simulated data. These improvements, however, as well as stability of the SVM and RF rankings, progressively decrease as the cutoff increases to 5r and 10r. As applications we compare the ranks of previously replicated SNPs in real data, associated regions in type 1 diabetes, as provided by the Type 1 Diabetes Consortium, and disease risk prediction accuracies as given by top ranked SNPs by the three methods. Software and webserver are available at http://svmsnps.njit.edu.
format Text
id pubmed-3089490
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-30894902011-05-09 Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest Roshan, Usman Chikkagoudar, Satish Wei, Zhi Wang, Kai Hakonarson, Hakon Nucleic Acids Res Methods Online We study the number of causal variants and associated regions identified by top SNPs in rankings given by the popular 1 df chi-squared statistic, support vector machine (SVM) and the random forest (RF) on simulated and real data. If we apply the SVM and RF to the top 2r chi-square-ranked SNPs, where r is the number of SNPs with P-values within the Bonferroni correction, we find that both improve the ranks of causal variants and associated regions and achieve higher power on simulated data. These improvements, however, as well as stability of the SVM and RF rankings, progressively decrease as the cutoff increases to 5r and 10r. As applications we compare the ranks of previously replicated SNPs in real data, associated regions in type 1 diabetes, as provided by the Type 1 Diabetes Consortium, and disease risk prediction accuracies as given by top ranked SNPs by the three methods. Software and webserver are available at http://svmsnps.njit.edu. Oxford University Press 2011-05 2011-02-11 /pmc/articles/PMC3089490/ /pubmed/21317188 http://dx.doi.org/10.1093/nar/gkr064 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Roshan, Usman
Chikkagoudar, Satish
Wei, Zhi
Wang, Kai
Hakonarson, Hakon
Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest
title Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest
title_full Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest
title_fullStr Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest
title_full_unstemmed Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest
title_short Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest
title_sort ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3089490/
https://www.ncbi.nlm.nih.gov/pubmed/21317188
http://dx.doi.org/10.1093/nar/gkr064
work_keys_str_mv AT roshanusman rankingcausalvariantsandassociatedregionsingenomewideassociationstudiesbythesupportvectormachineandrandomforest
AT chikkagoudarsatish rankingcausalvariantsandassociatedregionsingenomewideassociationstudiesbythesupportvectormachineandrandomforest
AT weizhi rankingcausalvariantsandassociatedregionsingenomewideassociationstudiesbythesupportvectormachineandrandomforest
AT wangkai rankingcausalvariantsandassociatedregionsingenomewideassociationstudiesbythesupportvectormachineandrandomforest
AT hakonarsonhakon rankingcausalvariantsandassociatedregionsingenomewideassociationstudiesbythesupportvectormachineandrandomforest