Cargando…
Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest
We study the number of causal variants and associated regions identified by top SNPs in rankings given by the popular 1 df chi-squared statistic, support vector machine (SVM) and the random forest (RF) on simulated and real data. If we apply the SVM and RF to the top 2r chi-square-ranked SNPs, where...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3089490/ https://www.ncbi.nlm.nih.gov/pubmed/21317188 http://dx.doi.org/10.1093/nar/gkr064 |
_version_ | 1782203055318499328 |
---|---|
author | Roshan, Usman Chikkagoudar, Satish Wei, Zhi Wang, Kai Hakonarson, Hakon |
author_facet | Roshan, Usman Chikkagoudar, Satish Wei, Zhi Wang, Kai Hakonarson, Hakon |
author_sort | Roshan, Usman |
collection | PubMed |
description | We study the number of causal variants and associated regions identified by top SNPs in rankings given by the popular 1 df chi-squared statistic, support vector machine (SVM) and the random forest (RF) on simulated and real data. If we apply the SVM and RF to the top 2r chi-square-ranked SNPs, where r is the number of SNPs with P-values within the Bonferroni correction, we find that both improve the ranks of causal variants and associated regions and achieve higher power on simulated data. These improvements, however, as well as stability of the SVM and RF rankings, progressively decrease as the cutoff increases to 5r and 10r. As applications we compare the ranks of previously replicated SNPs in real data, associated regions in type 1 diabetes, as provided by the Type 1 Diabetes Consortium, and disease risk prediction accuracies as given by top ranked SNPs by the three methods. Software and webserver are available at http://svmsnps.njit.edu. |
format | Text |
id | pubmed-3089490 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-30894902011-05-09 Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest Roshan, Usman Chikkagoudar, Satish Wei, Zhi Wang, Kai Hakonarson, Hakon Nucleic Acids Res Methods Online We study the number of causal variants and associated regions identified by top SNPs in rankings given by the popular 1 df chi-squared statistic, support vector machine (SVM) and the random forest (RF) on simulated and real data. If we apply the SVM and RF to the top 2r chi-square-ranked SNPs, where r is the number of SNPs with P-values within the Bonferroni correction, we find that both improve the ranks of causal variants and associated regions and achieve higher power on simulated data. These improvements, however, as well as stability of the SVM and RF rankings, progressively decrease as the cutoff increases to 5r and 10r. As applications we compare the ranks of previously replicated SNPs in real data, associated regions in type 1 diabetes, as provided by the Type 1 Diabetes Consortium, and disease risk prediction accuracies as given by top ranked SNPs by the three methods. Software and webserver are available at http://svmsnps.njit.edu. Oxford University Press 2011-05 2011-02-11 /pmc/articles/PMC3089490/ /pubmed/21317188 http://dx.doi.org/10.1093/nar/gkr064 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Roshan, Usman Chikkagoudar, Satish Wei, Zhi Wang, Kai Hakonarson, Hakon Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest |
title | Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest |
title_full | Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest |
title_fullStr | Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest |
title_full_unstemmed | Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest |
title_short | Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest |
title_sort | ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3089490/ https://www.ncbi.nlm.nih.gov/pubmed/21317188 http://dx.doi.org/10.1093/nar/gkr064 |
work_keys_str_mv | AT roshanusman rankingcausalvariantsandassociatedregionsingenomewideassociationstudiesbythesupportvectormachineandrandomforest AT chikkagoudarsatish rankingcausalvariantsandassociatedregionsingenomewideassociationstudiesbythesupportvectormachineandrandomforest AT weizhi rankingcausalvariantsandassociatedregionsingenomewideassociationstudiesbythesupportvectormachineandrandomforest AT wangkai rankingcausalvariantsandassociatedregionsingenomewideassociationstudiesbythesupportvectormachineandrandomforest AT hakonarsonhakon rankingcausalvariantsandassociatedregionsingenomewideassociationstudiesbythesupportvectormachineandrandomforest |