Cargando…
Improving classification of mature microRNA by solving class imbalance problem
MicroRNAs (miRNAs) are ~20–25 nucleotides non-coding RNAs, which regulated gene expression in the post-transcriptional level. The accurate rate of identifying the start sit of mature miRNA from a given pre-miRNA remains lower. It is noting that the mature miRNA prediction is a class-imbalanced probl...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4867574/ https://www.ncbi.nlm.nih.gov/pubmed/27181057 http://dx.doi.org/10.1038/srep25941 |
_version_ | 1782432045113278464 |
---|---|
author | Wang, Ying Li, Xiaoye Tao, Bairui |
author_facet | Wang, Ying Li, Xiaoye Tao, Bairui |
author_sort | Wang, Ying |
collection | PubMed |
description | MicroRNAs (miRNAs) are ~20–25 nucleotides non-coding RNAs, which regulated gene expression in the post-transcriptional level. The accurate rate of identifying the start sit of mature miRNA from a given pre-miRNA remains lower. It is noting that the mature miRNA prediction is a class-imbalanced problem which also leads to the unsatisfactory performance of these methods. We improved the prediction accuracy of classifier using balanced datasets and presented MatFind which is used for identifying 5′ mature miRNAs candidates from their pre-miRNA based on ensemble SVM classifiers with idea of adaboost. Firstly, the balanced-dataset was extract based on K-nearest neighbor algorithm. Secondly, the multiple SVM classifiers were trained in orderly using the balance datasets base on represented features. At last, all SVM classifiers were combined together to form the ensemble classifier. Our results on independent testing dataset show that the proposed method is more efficient than one without treating class imbalance problem. Moreover, MatFind achieves much higher classification accuracy than other three approaches. The ensemble SVM classifiers and balanced-datasets can solve the class-imbalanced problem, as well as improve performance of classifier for mature miRNA identification. MatFind is an accurate and fast method for 5′ mature miRNA identification. |
format | Online Article Text |
id | pubmed-4867574 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-48675742016-05-31 Improving classification of mature microRNA by solving class imbalance problem Wang, Ying Li, Xiaoye Tao, Bairui Sci Rep Article MicroRNAs (miRNAs) are ~20–25 nucleotides non-coding RNAs, which regulated gene expression in the post-transcriptional level. The accurate rate of identifying the start sit of mature miRNA from a given pre-miRNA remains lower. It is noting that the mature miRNA prediction is a class-imbalanced problem which also leads to the unsatisfactory performance of these methods. We improved the prediction accuracy of classifier using balanced datasets and presented MatFind which is used for identifying 5′ mature miRNAs candidates from their pre-miRNA based on ensemble SVM classifiers with idea of adaboost. Firstly, the balanced-dataset was extract based on K-nearest neighbor algorithm. Secondly, the multiple SVM classifiers were trained in orderly using the balance datasets base on represented features. At last, all SVM classifiers were combined together to form the ensemble classifier. Our results on independent testing dataset show that the proposed method is more efficient than one without treating class imbalance problem. Moreover, MatFind achieves much higher classification accuracy than other three approaches. The ensemble SVM classifiers and balanced-datasets can solve the class-imbalanced problem, as well as improve performance of classifier for mature miRNA identification. MatFind is an accurate and fast method for 5′ mature miRNA identification. Nature Publishing Group 2016-05-16 /pmc/articles/PMC4867574/ /pubmed/27181057 http://dx.doi.org/10.1038/srep25941 Text en Copyright © 2016, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Wang, Ying Li, Xiaoye Tao, Bairui Improving classification of mature microRNA by solving class imbalance problem |
title | Improving classification of mature microRNA by solving class imbalance problem |
title_full | Improving classification of mature microRNA by solving class imbalance problem |
title_fullStr | Improving classification of mature microRNA by solving class imbalance problem |
title_full_unstemmed | Improving classification of mature microRNA by solving class imbalance problem |
title_short | Improving classification of mature microRNA by solving class imbalance problem |
title_sort | improving classification of mature microrna by solving class imbalance problem |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4867574/ https://www.ncbi.nlm.nih.gov/pubmed/27181057 http://dx.doi.org/10.1038/srep25941 |
work_keys_str_mv | AT wangying improvingclassificationofmaturemicrornabysolvingclassimbalanceproblem AT lixiaoye improvingclassificationofmaturemicrornabysolvingclassimbalanceproblem AT taobairui improvingclassificationofmaturemicrornabysolvingclassimbalanceproblem |