Cargando…
Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification
An ensemble classifier approach for microRNA precursor (pre-miRNA) classification was proposed based upon combining a set of heterogeneous algorithms including support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF), then aggregating their prediction through a voting system. A...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3592496/ https://www.ncbi.nlm.nih.gov/pubmed/23012261 http://dx.doi.org/10.1093/nar/gks878 |
_version_ | 1782262126220410880 |
---|---|
author | Lertampaiporn, Supatcha Thammarongtham, Chinae Nukoolkit, Chakarida Kaewkamnerdpong, Boonserm Ruengjitchatchawalya, Marasri |
author_facet | Lertampaiporn, Supatcha Thammarongtham, Chinae Nukoolkit, Chakarida Kaewkamnerdpong, Boonserm Ruengjitchatchawalya, Marasri |
author_sort | Lertampaiporn, Supatcha |
collection | PubMed |
description | An ensemble classifier approach for microRNA precursor (pre-miRNA) classification was proposed based upon combining a set of heterogeneous algorithms including support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF), then aggregating their prediction through a voting system. Additionally, the proposed algorithm, the classification performance was also improved using discriminative features, self-containment and its derivatives, which have shown unique structural robustness characteristics of pre-miRNAs. These are applicable across different species. By applying preprocessing methods—both a correlation-based feature selection (CFS) with genetic algorithm (GA) search method and a modified-Synthetic Minority Oversampling Technique (SMOTE) bagging rebalancing method—improvement in the performance of this ensemble was observed. The overall prediction accuracies obtained via 10 runs of 5-fold cross validation (CV) was 96.54%, with sensitivity of 94.8% and specificity of 98.3%—this is better in trade-off sensitivity and specificity values than those of other state-of-the-art methods. The ensemble model was applied to animal, plant and virus pre-miRNA and achieved high accuracy, >93%. Exploiting the discriminative set of selected features also suggests that pre-miRNAs possess high intrinsic structural robustness as compared with other stem loops. Our heterogeneous ensemble method gave a relatively more reliable prediction than those using single classifiers. Our program is available at http://ncrna-pred.com/premiRNA.html. |
format | Online Article Text |
id | pubmed-3592496 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-35924962013-03-08 Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification Lertampaiporn, Supatcha Thammarongtham, Chinae Nukoolkit, Chakarida Kaewkamnerdpong, Boonserm Ruengjitchatchawalya, Marasri Nucleic Acids Res Methods Online An ensemble classifier approach for microRNA precursor (pre-miRNA) classification was proposed based upon combining a set of heterogeneous algorithms including support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF), then aggregating their prediction through a voting system. Additionally, the proposed algorithm, the classification performance was also improved using discriminative features, self-containment and its derivatives, which have shown unique structural robustness characteristics of pre-miRNAs. These are applicable across different species. By applying preprocessing methods—both a correlation-based feature selection (CFS) with genetic algorithm (GA) search method and a modified-Synthetic Minority Oversampling Technique (SMOTE) bagging rebalancing method—improvement in the performance of this ensemble was observed. The overall prediction accuracies obtained via 10 runs of 5-fold cross validation (CV) was 96.54%, with sensitivity of 94.8% and specificity of 98.3%—this is better in trade-off sensitivity and specificity values than those of other state-of-the-art methods. The ensemble model was applied to animal, plant and virus pre-miRNA and achieved high accuracy, >93%. Exploiting the discriminative set of selected features also suggests that pre-miRNAs possess high intrinsic structural robustness as compared with other stem loops. Our heterogeneous ensemble method gave a relatively more reliable prediction than those using single classifiers. Our program is available at http://ncrna-pred.com/premiRNA.html. Oxford University Press 2013-01 2012-09-24 /pmc/articles/PMC3592496/ /pubmed/23012261 http://dx.doi.org/10.1093/nar/gks878 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Lertampaiporn, Supatcha Thammarongtham, Chinae Nukoolkit, Chakarida Kaewkamnerdpong, Boonserm Ruengjitchatchawalya, Marasri Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification |
title | Heterogeneous ensemble approach with discriminative features and
modified-SMOTEbagging for pre-miRNA classification |
title_full | Heterogeneous ensemble approach with discriminative features and
modified-SMOTEbagging for pre-miRNA classification |
title_fullStr | Heterogeneous ensemble approach with discriminative features and
modified-SMOTEbagging for pre-miRNA classification |
title_full_unstemmed | Heterogeneous ensemble approach with discriminative features and
modified-SMOTEbagging for pre-miRNA classification |
title_short | Heterogeneous ensemble approach with discriminative features and
modified-SMOTEbagging for pre-miRNA classification |
title_sort | heterogeneous ensemble approach with discriminative features and
modified-smotebagging for pre-mirna classification |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3592496/ https://www.ncbi.nlm.nih.gov/pubmed/23012261 http://dx.doi.org/10.1093/nar/gks878 |
work_keys_str_mv | AT lertampaipornsupatcha heterogeneousensembleapproachwithdiscriminativefeaturesandmodifiedsmotebaggingforpremirnaclassification AT thammarongthamchinae heterogeneousensembleapproachwithdiscriminativefeaturesandmodifiedsmotebaggingforpremirnaclassification AT nukoolkitchakarida heterogeneousensembleapproachwithdiscriminativefeaturesandmodifiedsmotebaggingforpremirnaclassification AT kaewkamnerdpongboonserm heterogeneousensembleapproachwithdiscriminativefeaturesandmodifiedsmotebaggingforpremirnaclassification AT ruengjitchatchawalyamarasri heterogeneousensembleapproachwithdiscriminativefeaturesandmodifiedsmotebaggingforpremirnaclassification |