Cargando…

Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification

An ensemble classifier approach for microRNA precursor (pre-miRNA) classification was proposed based upon combining a set of heterogeneous algorithms including support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF), then aggregating their prediction through a voting system. A...

Descripción completa

Detalles Bibliográficos
Autores principales: Lertampaiporn, Supatcha, Thammarongtham, Chinae, Nukoolkit, Chakarida, Kaewkamnerdpong, Boonserm, Ruengjitchatchawalya, Marasri
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3592496/
https://www.ncbi.nlm.nih.gov/pubmed/23012261
http://dx.doi.org/10.1093/nar/gks878
_version_ 1782262126220410880
author Lertampaiporn, Supatcha
Thammarongtham, Chinae
Nukoolkit, Chakarida
Kaewkamnerdpong, Boonserm
Ruengjitchatchawalya, Marasri
author_facet Lertampaiporn, Supatcha
Thammarongtham, Chinae
Nukoolkit, Chakarida
Kaewkamnerdpong, Boonserm
Ruengjitchatchawalya, Marasri
author_sort Lertampaiporn, Supatcha
collection PubMed
description An ensemble classifier approach for microRNA precursor (pre-miRNA) classification was proposed based upon combining a set of heterogeneous algorithms including support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF), then aggregating their prediction through a voting system. Additionally, the proposed algorithm, the classification performance was also improved using discriminative features, self-containment and its derivatives, which have shown unique structural robustness characteristics of pre-miRNAs. These are applicable across different species. By applying preprocessing methods—both a correlation-based feature selection (CFS) with genetic algorithm (GA) search method and a modified-Synthetic Minority Oversampling Technique (SMOTE) bagging rebalancing method—improvement in the performance of this ensemble was observed. The overall prediction accuracies obtained via 10 runs of 5-fold cross validation (CV) was 96.54%, with sensitivity of 94.8% and specificity of 98.3%—this is better in trade-off sensitivity and specificity values than those of other state-of-the-art methods. The ensemble model was applied to animal, plant and virus pre-miRNA and achieved high accuracy, >93%. Exploiting the discriminative set of selected features also suggests that pre-miRNAs possess high intrinsic structural robustness as compared with other stem loops. Our heterogeneous ensemble method gave a relatively more reliable prediction than those using single classifiers. Our program is available at http://ncrna-pred.com/premiRNA.html.
format Online
Article
Text
id pubmed-3592496
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-35924962013-03-08 Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification Lertampaiporn, Supatcha Thammarongtham, Chinae Nukoolkit, Chakarida Kaewkamnerdpong, Boonserm Ruengjitchatchawalya, Marasri Nucleic Acids Res Methods Online An ensemble classifier approach for microRNA precursor (pre-miRNA) classification was proposed based upon combining a set of heterogeneous algorithms including support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF), then aggregating their prediction through a voting system. Additionally, the proposed algorithm, the classification performance was also improved using discriminative features, self-containment and its derivatives, which have shown unique structural robustness characteristics of pre-miRNAs. These are applicable across different species. By applying preprocessing methods—both a correlation-based feature selection (CFS) with genetic algorithm (GA) search method and a modified-Synthetic Minority Oversampling Technique (SMOTE) bagging rebalancing method—improvement in the performance of this ensemble was observed. The overall prediction accuracies obtained via 10 runs of 5-fold cross validation (CV) was 96.54%, with sensitivity of 94.8% and specificity of 98.3%—this is better in trade-off sensitivity and specificity values than those of other state-of-the-art methods. The ensemble model was applied to animal, plant and virus pre-miRNA and achieved high accuracy, >93%. Exploiting the discriminative set of selected features also suggests that pre-miRNAs possess high intrinsic structural robustness as compared with other stem loops. Our heterogeneous ensemble method gave a relatively more reliable prediction than those using single classifiers. Our program is available at http://ncrna-pred.com/premiRNA.html. Oxford University Press 2013-01 2012-09-24 /pmc/articles/PMC3592496/ /pubmed/23012261 http://dx.doi.org/10.1093/nar/gks878 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Lertampaiporn, Supatcha
Thammarongtham, Chinae
Nukoolkit, Chakarida
Kaewkamnerdpong, Boonserm
Ruengjitchatchawalya, Marasri
Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification
title Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification
title_full Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification
title_fullStr Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification
title_full_unstemmed Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification
title_short Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification
title_sort heterogeneous ensemble approach with discriminative features and modified-smotebagging for pre-mirna classification
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3592496/
https://www.ncbi.nlm.nih.gov/pubmed/23012261
http://dx.doi.org/10.1093/nar/gks878
work_keys_str_mv AT lertampaipornsupatcha heterogeneousensembleapproachwithdiscriminativefeaturesandmodifiedsmotebaggingforpremirnaclassification
AT thammarongthamchinae heterogeneousensembleapproachwithdiscriminativefeaturesandmodifiedsmotebaggingforpremirnaclassification
AT nukoolkitchakarida heterogeneousensembleapproachwithdiscriminativefeaturesandmodifiedsmotebaggingforpremirnaclassification
AT kaewkamnerdpongboonserm heterogeneousensembleapproachwithdiscriminativefeaturesandmodifiedsmotebaggingforpremirnaclassification
AT ruengjitchatchawalyamarasri heterogeneousensembleapproachwithdiscriminativefeaturesandmodifiedsmotebaggingforpremirnaclassification