Cargando…

Supervised learning-based tagSNP selection for genome-wide disease classifications

BACKGROUND: Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphisms (SNPs) with complex human diseases on the genome-wide scale is an active area in human genome research. One of the fundamental questions in a SNP-disease association study is to f...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Qingzhong, Yang, Jack, Chen, Zhongxue, Yang, Mary Qu, Sung, Andrew H, Huang, Xudong
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2386071/
https://www.ncbi.nlm.nih.gov/pubmed/18366619
http://dx.doi.org/10.1186/1471-2164-9-S1-S6
_version_ 1782155205588025344
author Liu, Qingzhong
Yang, Jack
Chen, Zhongxue
Yang, Mary Qu
Sung, Andrew H
Huang, Xudong
author_facet Liu, Qingzhong
Yang, Jack
Chen, Zhongxue
Yang, Mary Qu
Sung, Andrew H
Huang, Xudong
author_sort Liu, Qingzhong
collection PubMed
description BACKGROUND: Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphisms (SNPs) with complex human diseases on the genome-wide scale is an active area in human genome research. One of the fundamental questions in a SNP-disease association study is to find an optimal subset of SNPs with predicting power for disease status. To find that subset while reducing study burden in terms of time and costs, one can potentially reconcile information redundancy from associations between SNP markers. RESULTS: We have developed a feature selection method named Supervised Recursive Feature Addition (SRFA). This method combines supervised learning and statistical measures for the chosen candidate features/SNPs to reconcile the redundancy information and, in doing so, improve the classification performance in association studies. Additionally, we have proposed a Support Vector based Recursive Feature Addition (SVRFA) scheme in SNP-disease association analysis. CONCLUSIONS: We have proposed using SRFA with different statistical learning classifiers and SVRFA for both SNP selection and disease classification and then applying them to two complex disease data sets. In general, our approaches outperform the well-known feature selection method of Support Vector Machine Recursive Feature Elimination and logic regression-based SNP selection for disease classification in genetic association studies. Our study further indicates that both genetic and environmental variables should be taken into account when doing disease predictions and classifications for the most complex human diseases that have gene-environment interactions.
format Text
id pubmed-2386071
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23860712008-05-15 Supervised learning-based tagSNP selection for genome-wide disease classifications Liu, Qingzhong Yang, Jack Chen, Zhongxue Yang, Mary Qu Sung, Andrew H Huang, Xudong BMC Genomics Research BACKGROUND: Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphisms (SNPs) with complex human diseases on the genome-wide scale is an active area in human genome research. One of the fundamental questions in a SNP-disease association study is to find an optimal subset of SNPs with predicting power for disease status. To find that subset while reducing study burden in terms of time and costs, one can potentially reconcile information redundancy from associations between SNP markers. RESULTS: We have developed a feature selection method named Supervised Recursive Feature Addition (SRFA). This method combines supervised learning and statistical measures for the chosen candidate features/SNPs to reconcile the redundancy information and, in doing so, improve the classification performance in association studies. Additionally, we have proposed a Support Vector based Recursive Feature Addition (SVRFA) scheme in SNP-disease association analysis. CONCLUSIONS: We have proposed using SRFA with different statistical learning classifiers and SVRFA for both SNP selection and disease classification and then applying them to two complex disease data sets. In general, our approaches outperform the well-known feature selection method of Support Vector Machine Recursive Feature Elimination and logic regression-based SNP selection for disease classification in genetic association studies. Our study further indicates that both genetic and environmental variables should be taken into account when doing disease predictions and classifications for the most complex human diseases that have gene-environment interactions. BioMed Central 2008-03-20 /pmc/articles/PMC2386071/ /pubmed/18366619 http://dx.doi.org/10.1186/1471-2164-9-S1-S6 Text en Copyright © 2008 Liu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Liu, Qingzhong
Yang, Jack
Chen, Zhongxue
Yang, Mary Qu
Sung, Andrew H
Huang, Xudong
Supervised learning-based tagSNP selection for genome-wide disease classifications
title Supervised learning-based tagSNP selection for genome-wide disease classifications
title_full Supervised learning-based tagSNP selection for genome-wide disease classifications
title_fullStr Supervised learning-based tagSNP selection for genome-wide disease classifications
title_full_unstemmed Supervised learning-based tagSNP selection for genome-wide disease classifications
title_short Supervised learning-based tagSNP selection for genome-wide disease classifications
title_sort supervised learning-based tagsnp selection for genome-wide disease classifications
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2386071/
https://www.ncbi.nlm.nih.gov/pubmed/18366619
http://dx.doi.org/10.1186/1471-2164-9-S1-S6
work_keys_str_mv AT liuqingzhong supervisedlearningbasedtagsnpselectionforgenomewidediseaseclassifications
AT yangjack supervisedlearningbasedtagsnpselectionforgenomewidediseaseclassifications
AT chenzhongxue supervisedlearningbasedtagsnpselectionforgenomewidediseaseclassifications
AT yangmaryqu supervisedlearningbasedtagsnpselectionforgenomewidediseaseclassifications
AT sungandrewh supervisedlearningbasedtagsnpselectionforgenomewidediseaseclassifications
AT huangxudong supervisedlearningbasedtagsnpselectionforgenomewidediseaseclassifications