Cargando…

An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species

Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection app...

Descripción completa

Detalles Bibliográficos
Autores principales: Galpert, Deborah, del Río, Sara, Herrera, Francisco, Ancede-Gallardo, Evys, Antunes, Agostinho, Agüero-Chapin, Guillermin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4641943/
https://www.ncbi.nlm.nih.gov/pubmed/26605337
http://dx.doi.org/10.1155/2015/748681
_version_ 1782400270744944640
author Galpert, Deborah
del Río, Sara
Herrera, Francisco
Ancede-Gallardo, Evys
Antunes, Agostinho
Agüero-Chapin, Guillermin
author_facet Galpert, Deborah
del Río, Sara
Herrera, Francisco
Ancede-Gallardo, Evys
Antunes, Agostinho
Agüero-Chapin, Guillermin
author_sort Galpert, Deborah
collection PubMed
description Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.
format Online
Article
Text
id pubmed-4641943
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-46419432015-11-24 An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species Galpert, Deborah del Río, Sara Herrera, Francisco Ancede-Gallardo, Evys Antunes, Agostinho Agüero-Chapin, Guillermin Biomed Res Int Research Article Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification. Hindawi Publishing Corporation 2015 2015-10-29 /pmc/articles/PMC4641943/ /pubmed/26605337 http://dx.doi.org/10.1155/2015/748681 Text en Copyright © 2015 Deborah Galpert et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Galpert, Deborah
del Río, Sara
Herrera, Francisco
Ancede-Gallardo, Evys
Antunes, Agostinho
Agüero-Chapin, Guillermin
An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species
title An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species
title_full An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species
title_fullStr An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species
title_full_unstemmed An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species
title_short An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species
title_sort effective big data supervised imbalanced classification approach for ortholog detection in related yeast species
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4641943/
https://www.ncbi.nlm.nih.gov/pubmed/26605337
http://dx.doi.org/10.1155/2015/748681
work_keys_str_mv AT galpertdeborah aneffectivebigdatasupervisedimbalancedclassificationapproachfororthologdetectioninrelatedyeastspecies
AT delriosara aneffectivebigdatasupervisedimbalancedclassificationapproachfororthologdetectioninrelatedyeastspecies
AT herrerafrancisco aneffectivebigdatasupervisedimbalancedclassificationapproachfororthologdetectioninrelatedyeastspecies
AT ancedegallardoevys aneffectivebigdatasupervisedimbalancedclassificationapproachfororthologdetectioninrelatedyeastspecies
AT antunesagostinho aneffectivebigdatasupervisedimbalancedclassificationapproachfororthologdetectioninrelatedyeastspecies
AT aguerochapinguillermin aneffectivebigdatasupervisedimbalancedclassificationapproachfororthologdetectioninrelatedyeastspecies
AT galpertdeborah effectivebigdatasupervisedimbalancedclassificationapproachfororthologdetectioninrelatedyeastspecies
AT delriosara effectivebigdatasupervisedimbalancedclassificationapproachfororthologdetectioninrelatedyeastspecies
AT herrerafrancisco effectivebigdatasupervisedimbalancedclassificationapproachfororthologdetectioninrelatedyeastspecies
AT ancedegallardoevys effectivebigdatasupervisedimbalancedclassificationapproachfororthologdetectioninrelatedyeastspecies
AT antunesagostinho effectivebigdatasupervisedimbalancedclassificationapproachfororthologdetectioninrelatedyeastspecies
AT aguerochapinguillermin effectivebigdatasupervisedimbalancedclassificationapproachfororthologdetectioninrelatedyeastspecies