Cargando…

Improving classification of mature microRNA by solving class imbalance problem

MicroRNAs (miRNAs) are ~20–25 nucleotides non-coding RNAs, which regulated gene expression in the post-transcriptional level. The accurate rate of identifying the start sit of mature miRNA from a given pre-miRNA remains lower. It is noting that the mature miRNA prediction is a class-imbalanced probl...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Ying, Li, Xiaoye, Tao, Bairui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4867574/
https://www.ncbi.nlm.nih.gov/pubmed/27181057
http://dx.doi.org/10.1038/srep25941
_version_ 1782432045113278464
author Wang, Ying
Li, Xiaoye
Tao, Bairui
author_facet Wang, Ying
Li, Xiaoye
Tao, Bairui
author_sort Wang, Ying
collection PubMed
description MicroRNAs (miRNAs) are ~20–25 nucleotides non-coding RNAs, which regulated gene expression in the post-transcriptional level. The accurate rate of identifying the start sit of mature miRNA from a given pre-miRNA remains lower. It is noting that the mature miRNA prediction is a class-imbalanced problem which also leads to the unsatisfactory performance of these methods. We improved the prediction accuracy of classifier using balanced datasets and presented MatFind which is used for identifying 5′ mature miRNAs candidates from their pre-miRNA based on ensemble SVM classifiers with idea of adaboost. Firstly, the balanced-dataset was extract based on K-nearest neighbor algorithm. Secondly, the multiple SVM classifiers were trained in orderly using the balance datasets base on represented features. At last, all SVM classifiers were combined together to form the ensemble classifier. Our results on independent testing dataset show that the proposed method is more efficient than one without treating class imbalance problem. Moreover, MatFind achieves much higher classification accuracy than other three approaches. The ensemble SVM classifiers and balanced-datasets can solve the class-imbalanced problem, as well as improve performance of classifier for mature miRNA identification. MatFind is an accurate and fast method for 5′ mature miRNA identification.
format Online
Article
Text
id pubmed-4867574
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-48675742016-05-31 Improving classification of mature microRNA by solving class imbalance problem Wang, Ying Li, Xiaoye Tao, Bairui Sci Rep Article MicroRNAs (miRNAs) are ~20–25 nucleotides non-coding RNAs, which regulated gene expression in the post-transcriptional level. The accurate rate of identifying the start sit of mature miRNA from a given pre-miRNA remains lower. It is noting that the mature miRNA prediction is a class-imbalanced problem which also leads to the unsatisfactory performance of these methods. We improved the prediction accuracy of classifier using balanced datasets and presented MatFind which is used for identifying 5′ mature miRNAs candidates from their pre-miRNA based on ensemble SVM classifiers with idea of adaboost. Firstly, the balanced-dataset was extract based on K-nearest neighbor algorithm. Secondly, the multiple SVM classifiers were trained in orderly using the balance datasets base on represented features. At last, all SVM classifiers were combined together to form the ensemble classifier. Our results on independent testing dataset show that the proposed method is more efficient than one without treating class imbalance problem. Moreover, MatFind achieves much higher classification accuracy than other three approaches. The ensemble SVM classifiers and balanced-datasets can solve the class-imbalanced problem, as well as improve performance of classifier for mature miRNA identification. MatFind is an accurate and fast method for 5′ mature miRNA identification. Nature Publishing Group 2016-05-16 /pmc/articles/PMC4867574/ /pubmed/27181057 http://dx.doi.org/10.1038/srep25941 Text en Copyright © 2016, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Wang, Ying
Li, Xiaoye
Tao, Bairui
Improving classification of mature microRNA by solving class imbalance problem
title Improving classification of mature microRNA by solving class imbalance problem
title_full Improving classification of mature microRNA by solving class imbalance problem
title_fullStr Improving classification of mature microRNA by solving class imbalance problem
title_full_unstemmed Improving classification of mature microRNA by solving class imbalance problem
title_short Improving classification of mature microRNA by solving class imbalance problem
title_sort improving classification of mature microrna by solving class imbalance problem
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4867574/
https://www.ncbi.nlm.nih.gov/pubmed/27181057
http://dx.doi.org/10.1038/srep25941
work_keys_str_mv AT wangying improvingclassificationofmaturemicrornabysolvingclassimbalanceproblem
AT lixiaoye improvingclassificationofmaturemicrornabysolvingclassimbalanceproblem
AT taobairui improvingclassificationofmaturemicrornabysolvingclassimbalanceproblem