Cargando…

Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs

BACKGROUND: Distinction between pre-microRNAs (precursor microRNAs) and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes. Machine learning techniques have been widely applied to deal with this challenging problem. However, most of them ma...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Yuanlin, Yu, Zuguo, Han, Guosheng, Li, Jinyan, Anh, Vo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311913/
https://www.ncbi.nlm.nih.gov/pubmed/30598066
http://dx.doi.org/10.1186/s12859-018-2518-2
_version_ 1783383699954335744
author Ma, Yuanlin
Yu, Zuguo
Han, Guosheng
Li, Jinyan
Anh, Vo
author_facet Ma, Yuanlin
Yu, Zuguo
Han, Guosheng
Li, Jinyan
Anh, Vo
author_sort Ma, Yuanlin
collection PubMed
description BACKGROUND: Distinction between pre-microRNAs (precursor microRNAs) and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes. Machine learning techniques have been widely applied to deal with this challenging problem. However, most of them mainly focus on secondary structure information of pre-microRNAs, while ignoring sequence-order information and sequence evolution information. RESULTS: We use new features for the machine learning algorithms to improve the classification performance by characterizing both sequence order evolution information and secondary structure graphs. We developed three steps to extract these features of pre-microRNAs. We first extract features from PSI-BLAST profiles and Hilbert-Huang transforms, which contain rich sequence evolution information and sequence-order information respectively. We then obtain properties of small molecular networks of pre-microRNAs, which contain refined secondary structure information. These structural features are carefully generated so that they can depict both global and local characteristics of pre-microRNAs. In total, our feature space covers 591 features. The maximum relevance and minimum redundancy (mRMR) feature selection method is adopted before support vector machine (SVM) is applied as our classifier. The constructed classification model is named MicroRNA −NHPred. The performance of MicroRNA −NHPred is high and stable, which is better than that of those state-of-the-art methods, achieving an accuracy of up to 94.83% on same benchmark datasets. CONCLUSIONS: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the sequences and secondary structures, which are capable of characterizing the sequence evolution information and sequence-order information, and global and local information of pre-microRNAs secondary structures. MicroRNA −NHPred is a valuable method for pre-microRNAs identification. The source codes of our method can be downloaded from https://github.com/myl446/MicroRNA-NHPred. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2518-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6311913
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63119132019-01-07 Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs Ma, Yuanlin Yu, Zuguo Han, Guosheng Li, Jinyan Anh, Vo BMC Bioinformatics Research BACKGROUND: Distinction between pre-microRNAs (precursor microRNAs) and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes. Machine learning techniques have been widely applied to deal with this challenging problem. However, most of them mainly focus on secondary structure information of pre-microRNAs, while ignoring sequence-order information and sequence evolution information. RESULTS: We use new features for the machine learning algorithms to improve the classification performance by characterizing both sequence order evolution information and secondary structure graphs. We developed three steps to extract these features of pre-microRNAs. We first extract features from PSI-BLAST profiles and Hilbert-Huang transforms, which contain rich sequence evolution information and sequence-order information respectively. We then obtain properties of small molecular networks of pre-microRNAs, which contain refined secondary structure information. These structural features are carefully generated so that they can depict both global and local characteristics of pre-microRNAs. In total, our feature space covers 591 features. The maximum relevance and minimum redundancy (mRMR) feature selection method is adopted before support vector machine (SVM) is applied as our classifier. The constructed classification model is named MicroRNA −NHPred. The performance of MicroRNA −NHPred is high and stable, which is better than that of those state-of-the-art methods, achieving an accuracy of up to 94.83% on same benchmark datasets. CONCLUSIONS: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the sequences and secondary structures, which are capable of characterizing the sequence evolution information and sequence-order information, and global and local information of pre-microRNAs secondary structures. MicroRNA −NHPred is a valuable method for pre-microRNAs identification. The source codes of our method can be downloaded from https://github.com/myl446/MicroRNA-NHPred. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2518-2) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-31 /pmc/articles/PMC6311913/ /pubmed/30598066 http://dx.doi.org/10.1186/s12859-018-2518-2 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Ma, Yuanlin
Yu, Zuguo
Han, Guosheng
Li, Jinyan
Anh, Vo
Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs
title Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs
title_full Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs
title_fullStr Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs
title_full_unstemmed Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs
title_short Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs
title_sort identification of pre-micrornas by characterizing their sequence order evolution information and secondary structure graphs
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311913/
https://www.ncbi.nlm.nih.gov/pubmed/30598066
http://dx.doi.org/10.1186/s12859-018-2518-2
work_keys_str_mv AT mayuanlin identificationofpremicrornasbycharacterizingtheirsequenceorderevolutioninformationandsecondarystructuregraphs
AT yuzuguo identificationofpremicrornasbycharacterizingtheirsequenceorderevolutioninformationandsecondarystructuregraphs
AT hanguosheng identificationofpremicrornasbycharacterizingtheirsequenceorderevolutioninformationandsecondarystructuregraphs
AT lijinyan identificationofpremicrornasbycharacterizingtheirsequenceorderevolutioninformationandsecondarystructuregraphs
AT anhvo identificationofpremicrornasbycharacterizingtheirsequenceorderevolutioninformationandsecondarystructuregraphs