Cargando…

Phenotype prediction from genome-wide association studies: application to smoking behaviors

BACKGROUND: A great success of the genome wide association study enabled us to give more attention on the personal genome and clinical application such as diagnosis and disease risk prediction. However, previous prediction studies using known disease associated loci have not been successful (Area Un...

Descripción completa

Detalles Bibliográficos
Autores principales: Yoon, Dankyu, Kim, Young Jin, Park, Taesung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521177/
https://www.ncbi.nlm.nih.gov/pubmed/23281841
http://dx.doi.org/10.1186/1752-0509-6-S2-S11
_version_ 1782252898725396480
author Yoon, Dankyu
Kim, Young Jin
Park, Taesung
author_facet Yoon, Dankyu
Kim, Young Jin
Park, Taesung
author_sort Yoon, Dankyu
collection PubMed
description BACKGROUND: A great success of the genome wide association study enabled us to give more attention on the personal genome and clinical application such as diagnosis and disease risk prediction. However, previous prediction studies using known disease associated loci have not been successful (Area Under Curve 0.55 ~ 0.68 for type 2 diabetes and coronary heart disease). There are several reasons for poor predictability such as small number of known disease-associated loci, simple analysis not considering complexity in phenotype, and a limited number of features used for prediction. METHODS: In this research, we investigated the effect of feature selection and prediction algorithm on the performance of prediction method thoroughly. In particular, we considered the following feature selection and prediction methods: regression analysis, regularized regression analysis, linear discriminant analysis, non-linear support vector machine, and random forest. For these methods, we studied the effects of feature selection and the number of features on prediction. Our investigation was based on the analysis of 8,842 Korean individuals genotyped by Affymetrix SNP array 5.0, for predicting smoking behaviors. RESULTS: To observe the effect of feature selection methods on prediction performance, selected features were used for prediction and area under the curve score was measured. For feature selection, the performances of support vector machine (SVM) and elastic-net (EN) showed better results than those of linear discriminant analysis (LDA), random forest (RF) and simple logistic regression (LR) methods. For prediction, SVM showed the best performance based on area under the curve score. With less than 100 SNPs, EN was the best prediction method while SVM was the best if over 400 SNPs were used for the prediction. CONCLUSIONS: Based on combination of feature selection and prediction methods, SVM showed the best performance in feature selection and prediction.
format Online
Article
Text
id pubmed-3521177
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35211772012-12-14 Phenotype prediction from genome-wide association studies: application to smoking behaviors Yoon, Dankyu Kim, Young Jin Park, Taesung BMC Syst Biol Proceedings BACKGROUND: A great success of the genome wide association study enabled us to give more attention on the personal genome and clinical application such as diagnosis and disease risk prediction. However, previous prediction studies using known disease associated loci have not been successful (Area Under Curve 0.55 ~ 0.68 for type 2 diabetes and coronary heart disease). There are several reasons for poor predictability such as small number of known disease-associated loci, simple analysis not considering complexity in phenotype, and a limited number of features used for prediction. METHODS: In this research, we investigated the effect of feature selection and prediction algorithm on the performance of prediction method thoroughly. In particular, we considered the following feature selection and prediction methods: regression analysis, regularized regression analysis, linear discriminant analysis, non-linear support vector machine, and random forest. For these methods, we studied the effects of feature selection and the number of features on prediction. Our investigation was based on the analysis of 8,842 Korean individuals genotyped by Affymetrix SNP array 5.0, for predicting smoking behaviors. RESULTS: To observe the effect of feature selection methods on prediction performance, selected features were used for prediction and area under the curve score was measured. For feature selection, the performances of support vector machine (SVM) and elastic-net (EN) showed better results than those of linear discriminant analysis (LDA), random forest (RF) and simple logistic regression (LR) methods. For prediction, SVM showed the best performance based on area under the curve score. With less than 100 SNPs, EN was the best prediction method while SVM was the best if over 400 SNPs were used for the prediction. CONCLUSIONS: Based on combination of feature selection and prediction methods, SVM showed the best performance in feature selection and prediction. BioMed Central 2012-12-12 /pmc/articles/PMC3521177/ /pubmed/23281841 http://dx.doi.org/10.1186/1752-0509-6-S2-S11 Text en Copyright ©2012 Yoon et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Yoon, Dankyu
Kim, Young Jin
Park, Taesung
Phenotype prediction from genome-wide association studies: application to smoking behaviors
title Phenotype prediction from genome-wide association studies: application to smoking behaviors
title_full Phenotype prediction from genome-wide association studies: application to smoking behaviors
title_fullStr Phenotype prediction from genome-wide association studies: application to smoking behaviors
title_full_unstemmed Phenotype prediction from genome-wide association studies: application to smoking behaviors
title_short Phenotype prediction from genome-wide association studies: application to smoking behaviors
title_sort phenotype prediction from genome-wide association studies: application to smoking behaviors
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521177/
https://www.ncbi.nlm.nih.gov/pubmed/23281841
http://dx.doi.org/10.1186/1752-0509-6-S2-S11
work_keys_str_mv AT yoondankyu phenotypepredictionfromgenomewideassociationstudiesapplicationtosmokingbehaviors
AT kimyoungjin phenotypepredictionfromgenomewideassociationstudiesapplicationtosmokingbehaviors
AT parktaesung phenotypepredictionfromgenomewideassociationstudiesapplicationtosmokingbehaviors