Cargando…
SNPs selection using support vector regression and genetic algorithms in GWAS
INTRODUCTION: This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic al...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243330/ https://www.ncbi.nlm.nih.gov/pubmed/25573332 http://dx.doi.org/10.1186/1471-2164-15-S7-S4 |
_version_ | 1782346090979262464 |
---|---|
author | de Oliveira, Fabrízzio Condé Borges, Carlos Cristiano Hasenclever Almeida, Fernanda Nascimento e Silva, Fabyano Fonseca da Silva Verneque, Rui da Silva, Marcos Vinicius GB Arbex, Wagner |
author_facet | de Oliveira, Fabrízzio Condé Borges, Carlos Cristiano Hasenclever Almeida, Fernanda Nascimento e Silva, Fabyano Fonseca da Silva Verneque, Rui da Silva, Marcos Vinicius GB Arbex, Wagner |
author_sort | de Oliveira, Fabrízzio Condé |
collection | PubMed |
description | INTRODUCTION: This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. RESULTS: The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. CONCLUSIONS: The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels. |
format | Online Article Text |
id | pubmed-4243330 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-42433302014-11-26 SNPs selection using support vector regression and genetic algorithms in GWAS de Oliveira, Fabrízzio Condé Borges, Carlos Cristiano Hasenclever Almeida, Fernanda Nascimento e Silva, Fabyano Fonseca da Silva Verneque, Rui da Silva, Marcos Vinicius GB Arbex, Wagner BMC Genomics Research INTRODUCTION: This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. RESULTS: The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. CONCLUSIONS: The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels. BioMed Central 2014-10-27 /pmc/articles/PMC4243330/ /pubmed/25573332 http://dx.doi.org/10.1186/1471-2164-15-S7-S4 Text en Copyright © 2014 de Oliveira et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research de Oliveira, Fabrízzio Condé Borges, Carlos Cristiano Hasenclever Almeida, Fernanda Nascimento e Silva, Fabyano Fonseca da Silva Verneque, Rui da Silva, Marcos Vinicius GB Arbex, Wagner SNPs selection using support vector regression and genetic algorithms in GWAS |
title | SNPs selection using support vector regression and genetic algorithms in GWAS |
title_full | SNPs selection using support vector regression and genetic algorithms in GWAS |
title_fullStr | SNPs selection using support vector regression and genetic algorithms in GWAS |
title_full_unstemmed | SNPs selection using support vector regression and genetic algorithms in GWAS |
title_short | SNPs selection using support vector regression and genetic algorithms in GWAS |
title_sort | snps selection using support vector regression and genetic algorithms in gwas |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243330/ https://www.ncbi.nlm.nih.gov/pubmed/25573332 http://dx.doi.org/10.1186/1471-2164-15-S7-S4 |
work_keys_str_mv | AT deoliveirafabrizzioconde snpsselectionusingsupportvectorregressionandgeneticalgorithmsingwas AT borgescarloscristianohasenclever snpsselectionusingsupportvectorregressionandgeneticalgorithmsingwas AT almeidafernandanascimento snpsselectionusingsupportvectorregressionandgeneticalgorithmsingwas AT esilvafabyanofonseca snpsselectionusingsupportvectorregressionandgeneticalgorithmsingwas AT dasilvavernequerui snpsselectionusingsupportvectorregressionandgeneticalgorithmsingwas AT dasilvamarcosviniciusgb snpsselectionusingsupportvectorregressionandgeneticalgorithmsingwas AT arbexwagner snpsselectionusingsupportvectorregressionandgeneticalgorithmsingwas |