Cargando…
Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs
BACKGROUND: Distinction between pre-microRNAs (precursor microRNAs) and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes. Machine learning techniques have been widely applied to deal with this challenging problem. However, most of them ma...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311913/ https://www.ncbi.nlm.nih.gov/pubmed/30598066 http://dx.doi.org/10.1186/s12859-018-2518-2 |
_version_ | 1783383699954335744 |
---|---|
author | Ma, Yuanlin Yu, Zuguo Han, Guosheng Li, Jinyan Anh, Vo |
author_facet | Ma, Yuanlin Yu, Zuguo Han, Guosheng Li, Jinyan Anh, Vo |
author_sort | Ma, Yuanlin |
collection | PubMed |
description | BACKGROUND: Distinction between pre-microRNAs (precursor microRNAs) and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes. Machine learning techniques have been widely applied to deal with this challenging problem. However, most of them mainly focus on secondary structure information of pre-microRNAs, while ignoring sequence-order information and sequence evolution information. RESULTS: We use new features for the machine learning algorithms to improve the classification performance by characterizing both sequence order evolution information and secondary structure graphs. We developed three steps to extract these features of pre-microRNAs. We first extract features from PSI-BLAST profiles and Hilbert-Huang transforms, which contain rich sequence evolution information and sequence-order information respectively. We then obtain properties of small molecular networks of pre-microRNAs, which contain refined secondary structure information. These structural features are carefully generated so that they can depict both global and local characteristics of pre-microRNAs. In total, our feature space covers 591 features. The maximum relevance and minimum redundancy (mRMR) feature selection method is adopted before support vector machine (SVM) is applied as our classifier. The constructed classification model is named MicroRNA −NHPred. The performance of MicroRNA −NHPred is high and stable, which is better than that of those state-of-the-art methods, achieving an accuracy of up to 94.83% on same benchmark datasets. CONCLUSIONS: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the sequences and secondary structures, which are capable of characterizing the sequence evolution information and sequence-order information, and global and local information of pre-microRNAs secondary structures. MicroRNA −NHPred is a valuable method for pre-microRNAs identification. The source codes of our method can be downloaded from https://github.com/myl446/MicroRNA-NHPred. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2518-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6311913 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63119132019-01-07 Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs Ma, Yuanlin Yu, Zuguo Han, Guosheng Li, Jinyan Anh, Vo BMC Bioinformatics Research BACKGROUND: Distinction between pre-microRNAs (precursor microRNAs) and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes. Machine learning techniques have been widely applied to deal with this challenging problem. However, most of them mainly focus on secondary structure information of pre-microRNAs, while ignoring sequence-order information and sequence evolution information. RESULTS: We use new features for the machine learning algorithms to improve the classification performance by characterizing both sequence order evolution information and secondary structure graphs. We developed three steps to extract these features of pre-microRNAs. We first extract features from PSI-BLAST profiles and Hilbert-Huang transforms, which contain rich sequence evolution information and sequence-order information respectively. We then obtain properties of small molecular networks of pre-microRNAs, which contain refined secondary structure information. These structural features are carefully generated so that they can depict both global and local characteristics of pre-microRNAs. In total, our feature space covers 591 features. The maximum relevance and minimum redundancy (mRMR) feature selection method is adopted before support vector machine (SVM) is applied as our classifier. The constructed classification model is named MicroRNA −NHPred. The performance of MicroRNA −NHPred is high and stable, which is better than that of those state-of-the-art methods, achieving an accuracy of up to 94.83% on same benchmark datasets. CONCLUSIONS: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the sequences and secondary structures, which are capable of characterizing the sequence evolution information and sequence-order information, and global and local information of pre-microRNAs secondary structures. MicroRNA −NHPred is a valuable method for pre-microRNAs identification. The source codes of our method can be downloaded from https://github.com/myl446/MicroRNA-NHPred. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2518-2) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-31 /pmc/articles/PMC6311913/ /pubmed/30598066 http://dx.doi.org/10.1186/s12859-018-2518-2 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Ma, Yuanlin Yu, Zuguo Han, Guosheng Li, Jinyan Anh, Vo Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs |
title | Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs |
title_full | Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs |
title_fullStr | Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs |
title_full_unstemmed | Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs |
title_short | Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs |
title_sort | identification of pre-micrornas by characterizing their sequence order evolution information and secondary structure graphs |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311913/ https://www.ncbi.nlm.nih.gov/pubmed/30598066 http://dx.doi.org/10.1186/s12859-018-2518-2 |
work_keys_str_mv | AT mayuanlin identificationofpremicrornasbycharacterizingtheirsequenceorderevolutioninformationandsecondarystructuregraphs AT yuzuguo identificationofpremicrornasbycharacterizingtheirsequenceorderevolutioninformationandsecondarystructuregraphs AT hanguosheng identificationofpremicrornasbycharacterizingtheirsequenceorderevolutioninformationandsecondarystructuregraphs AT lijinyan identificationofpremicrornasbycharacterizingtheirsequenceorderevolutioninformationandsecondarystructuregraphs AT anhvo identificationofpremicrornasbycharacterizingtheirsequenceorderevolutioninformationandsecondarystructuregraphs |