Cargando…

Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences

BACKGROUND: MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificit...

Descripción completa

Detalles Bibliográficos
Autores principales: ElGokhy, Sherin M, ElHefnawi, Mahmoud, Shoukry, Amin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4051165/
https://www.ncbi.nlm.nih.gov/pubmed/24884968
http://dx.doi.org/10.1186/1756-0500-7-286
_version_ 1782320070385467392
author ElGokhy, Sherin M
ElHefnawi, Mahmoud
Shoukry, Amin
author_facet ElGokhy, Sherin M
ElHefnawi, Mahmoud
Shoukry, Amin
author_sort ElGokhy, Sherin M
collection PubMed
description BACKGROUND: MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons. RESULTS: The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f–measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index. The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred. The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs. CONCLUSIONS: The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers.
format Online
Article
Text
id pubmed-4051165
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40511652014-06-20 Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences ElGokhy, Sherin M ElHefnawi, Mahmoud Shoukry, Amin BMC Res Notes Research Article BACKGROUND: MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons. RESULTS: The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f–measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index. The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred. The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs. CONCLUSIONS: The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers. BioMed Central 2014-05-06 /pmc/articles/PMC4051165/ /pubmed/24884968 http://dx.doi.org/10.1186/1756-0500-7-286 Text en Copyright © 2014 ElGokhy et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
ElGokhy, Sherin M
ElHefnawi, Mahmoud
Shoukry, Amin
Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences
title Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences
title_full Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences
title_fullStr Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences
title_full_unstemmed Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences
title_short Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences
title_sort ensemble-based classification approach for micro-rna mining applied on diverse metagenomic sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4051165/
https://www.ncbi.nlm.nih.gov/pubmed/24884968
http://dx.doi.org/10.1186/1756-0500-7-286
work_keys_str_mv AT elgokhysherinm ensemblebasedclassificationapproachformicrornaminingappliedondiversemetagenomicsequences
AT elhefnawimahmoud ensemblebasedclassificationapproachformicrornaminingappliedondiversemetagenomicsequences
AT shoukryamin ensemblebasedclassificationapproachformicrornaminingappliedondiversemetagenomicsequences