Cargando…
Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences
BACKGROUND: MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificit...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4051165/ https://www.ncbi.nlm.nih.gov/pubmed/24884968 http://dx.doi.org/10.1186/1756-0500-7-286 |
_version_ | 1782320070385467392 |
---|---|
author | ElGokhy, Sherin M ElHefnawi, Mahmoud Shoukry, Amin |
author_facet | ElGokhy, Sherin M ElHefnawi, Mahmoud Shoukry, Amin |
author_sort | ElGokhy, Sherin M |
collection | PubMed |
description | BACKGROUND: MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons. RESULTS: The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f–measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index. The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred. The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs. CONCLUSIONS: The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers. |
format | Online Article Text |
id | pubmed-4051165 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40511652014-06-20 Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences ElGokhy, Sherin M ElHefnawi, Mahmoud Shoukry, Amin BMC Res Notes Research Article BACKGROUND: MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons. RESULTS: The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f–measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index. The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred. The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs. CONCLUSIONS: The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers. BioMed Central 2014-05-06 /pmc/articles/PMC4051165/ /pubmed/24884968 http://dx.doi.org/10.1186/1756-0500-7-286 Text en Copyright © 2014 ElGokhy et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article ElGokhy, Sherin M ElHefnawi, Mahmoud Shoukry, Amin Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences |
title | Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences |
title_full | Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences |
title_fullStr | Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences |
title_full_unstemmed | Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences |
title_short | Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences |
title_sort | ensemble-based classification approach for micro-rna mining applied on diverse metagenomic sequences |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4051165/ https://www.ncbi.nlm.nih.gov/pubmed/24884968 http://dx.doi.org/10.1186/1756-0500-7-286 |
work_keys_str_mv | AT elgokhysherinm ensemblebasedclassificationapproachformicrornaminingappliedondiversemetagenomicsequences AT elhefnawimahmoud ensemblebasedclassificationapproachformicrornaminingappliedondiversemetagenomicsequences AT shoukryamin ensemblebasedclassificationapproachformicrornaminingappliedondiversemetagenomicsequences |