Cargando…

SNPs selection using support vector regression and genetic algorithms in GWAS

INTRODUCTION: This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic al...

Descripción completa

Detalles Bibliográficos
Autores principales: de Oliveira, Fabrízzio Condé, Borges, Carlos Cristiano Hasenclever, Almeida, Fernanda Nascimento, e Silva, Fabyano Fonseca, da Silva Verneque, Rui, da Silva, Marcos Vinicius GB, Arbex, Wagner
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243330/
https://www.ncbi.nlm.nih.gov/pubmed/25573332
http://dx.doi.org/10.1186/1471-2164-15-S7-S4
_version_ 1782346090979262464
author de Oliveira, Fabrízzio Condé
Borges, Carlos Cristiano Hasenclever
Almeida, Fernanda Nascimento
e Silva, Fabyano Fonseca
da Silva Verneque, Rui
da Silva, Marcos Vinicius GB
Arbex, Wagner
author_facet de Oliveira, Fabrízzio Condé
Borges, Carlos Cristiano Hasenclever
Almeida, Fernanda Nascimento
e Silva, Fabyano Fonseca
da Silva Verneque, Rui
da Silva, Marcos Vinicius GB
Arbex, Wagner
author_sort de Oliveira, Fabrízzio Condé
collection PubMed
description INTRODUCTION: This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. RESULTS: The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. CONCLUSIONS: The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels.
format Online
Article
Text
id pubmed-4243330
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42433302014-11-26 SNPs selection using support vector regression and genetic algorithms in GWAS de Oliveira, Fabrízzio Condé Borges, Carlos Cristiano Hasenclever Almeida, Fernanda Nascimento e Silva, Fabyano Fonseca da Silva Verneque, Rui da Silva, Marcos Vinicius GB Arbex, Wagner BMC Genomics Research INTRODUCTION: This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. RESULTS: The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. CONCLUSIONS: The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels. BioMed Central 2014-10-27 /pmc/articles/PMC4243330/ /pubmed/25573332 http://dx.doi.org/10.1186/1471-2164-15-S7-S4 Text en Copyright © 2014 de Oliveira et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
de Oliveira, Fabrízzio Condé
Borges, Carlos Cristiano Hasenclever
Almeida, Fernanda Nascimento
e Silva, Fabyano Fonseca
da Silva Verneque, Rui
da Silva, Marcos Vinicius GB
Arbex, Wagner
SNPs selection using support vector regression and genetic algorithms in GWAS
title SNPs selection using support vector regression and genetic algorithms in GWAS
title_full SNPs selection using support vector regression and genetic algorithms in GWAS
title_fullStr SNPs selection using support vector regression and genetic algorithms in GWAS
title_full_unstemmed SNPs selection using support vector regression and genetic algorithms in GWAS
title_short SNPs selection using support vector regression and genetic algorithms in GWAS
title_sort snps selection using support vector regression and genetic algorithms in gwas
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243330/
https://www.ncbi.nlm.nih.gov/pubmed/25573332
http://dx.doi.org/10.1186/1471-2164-15-S7-S4
work_keys_str_mv AT deoliveirafabrizzioconde snpsselectionusingsupportvectorregressionandgeneticalgorithmsingwas
AT borgescarloscristianohasenclever snpsselectionusingsupportvectorregressionandgeneticalgorithmsingwas
AT almeidafernandanascimento snpsselectionusingsupportvectorregressionandgeneticalgorithmsingwas
AT esilvafabyanofonseca snpsselectionusingsupportvectorregressionandgeneticalgorithmsingwas
AT dasilvavernequerui snpsselectionusingsupportvectorregressionandgeneticalgorithmsingwas
AT dasilvamarcosviniciusgb snpsselectionusingsupportvectorregressionandgeneticalgorithmsingwas
AT arbexwagner snpsselectionusingsupportvectorregressionandgeneticalgorithmsingwas