Cargando…

Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations

Non-coding RNAs (ncRNAs) play crucial roles in multiple biological processes. However, only a few ncRNAs’ functions have been well studied. Given the significance of ncRNAs classification for understanding ncRNAs’ functions, more and more computational methods have been introduced to improve the cla...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Ying, Zhang, Qi, Liu, Zhaoqian, Wang, Cankun, Han, Siyu, Ma, Qin, Du, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294561/
https://www.ncbi.nlm.nih.gov/pubmed/33367506
http://dx.doi.org/10.1093/bib/bbaa354
_version_ 1783725261344210944
author Li, Ying
Zhang, Qi
Liu, Zhaoqian
Wang, Cankun
Han, Siyu
Ma, Qin
Du, Wei
author_facet Li, Ying
Zhang, Qi
Liu, Zhaoqian
Wang, Cankun
Han, Siyu
Ma, Qin
Du, Wei
author_sort Li, Ying
collection PubMed
description Non-coding RNAs (ncRNAs) play crucial roles in multiple biological processes. However, only a few ncRNAs’ functions have been well studied. Given the significance of ncRNAs classification for understanding ncRNAs’ functions, more and more computational methods have been introduced to improve the classification automatically and accurately. In this paper, based on a convolutional neural network and a deep forest algorithm, multi-grained cascade forest (GcForest), we propose a novel deep fusion learning framework, GcForest fusion method (GCFM), to classify alignments of ncRNA sequences for accurate clustering of ncRNAs. GCFM integrates a multi-view structure feature representation including sequence-structure alignment encoding, structure image representation and shape alignment encoding of structural subunits, enabling us to capture the potential specificity between ncRNAs. For the classification of pairwise alignment of two ncRNA sequences, the F-value of GCFM improves 6% than an existing alignment-based method. Furthermore, the clustering of ncRNA families is carried out based on the classification matrix generated from GCFM. Results suggest better performance (with 20% accuracy improved) than existing ncRNA clustering methods (RNAclust, Ensembleclust and CNNclust). Additionally, we apply GCFM to construct a phylogenetic tree of ncRNA and predict the probability of interactions between RNAs. Most ncRNAs are located correctly in the phylogenetic tree, and the prediction accuracy of RNA interaction is 90.63%. A web server (http://bmbl.sdstate.edu/gcfm/) is developed to maximize its availability, and the source code and related data are available at the same URL.
format Online
Article
Text
id pubmed-8294561
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-82945612021-07-22 Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations Li, Ying Zhang, Qi Liu, Zhaoqian Wang, Cankun Han, Siyu Ma, Qin Du, Wei Brief Bioinform Articles Non-coding RNAs (ncRNAs) play crucial roles in multiple biological processes. However, only a few ncRNAs’ functions have been well studied. Given the significance of ncRNAs classification for understanding ncRNAs’ functions, more and more computational methods have been introduced to improve the classification automatically and accurately. In this paper, based on a convolutional neural network and a deep forest algorithm, multi-grained cascade forest (GcForest), we propose a novel deep fusion learning framework, GcForest fusion method (GCFM), to classify alignments of ncRNA sequences for accurate clustering of ncRNAs. GCFM integrates a multi-view structure feature representation including sequence-structure alignment encoding, structure image representation and shape alignment encoding of structural subunits, enabling us to capture the potential specificity between ncRNAs. For the classification of pairwise alignment of two ncRNA sequences, the F-value of GCFM improves 6% than an existing alignment-based method. Furthermore, the clustering of ncRNA families is carried out based on the classification matrix generated from GCFM. Results suggest better performance (with 20% accuracy improved) than existing ncRNA clustering methods (RNAclust, Ensembleclust and CNNclust). Additionally, we apply GCFM to construct a phylogenetic tree of ncRNA and predict the probability of interactions between RNAs. Most ncRNAs are located correctly in the phylogenetic tree, and the prediction accuracy of RNA interaction is 90.63%. A web server (http://bmbl.sdstate.edu/gcfm/) is developed to maximize its availability, and the source code and related data are available at the same URL. Oxford University Press 2020-12-23 /pmc/articles/PMC8294561/ /pubmed/33367506 http://dx.doi.org/10.1093/bib/bbaa354 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Articles
Li, Ying
Zhang, Qi
Liu, Zhaoqian
Wang, Cankun
Han, Siyu
Ma, Qin
Du, Wei
Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations
title Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations
title_full Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations
title_fullStr Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations
title_full_unstemmed Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations
title_short Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations
title_sort deep forest ensemble learning for classification of alignments of non-coding rna sequences based on multi-view structure representations
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294561/
https://www.ncbi.nlm.nih.gov/pubmed/33367506
http://dx.doi.org/10.1093/bib/bbaa354
work_keys_str_mv AT liying deepforestensemblelearningforclassificationofalignmentsofnoncodingrnasequencesbasedonmultiviewstructurerepresentations
AT zhangqi deepforestensemblelearningforclassificationofalignmentsofnoncodingrnasequencesbasedonmultiviewstructurerepresentations
AT liuzhaoqian deepforestensemblelearningforclassificationofalignmentsofnoncodingrnasequencesbasedonmultiviewstructurerepresentations
AT wangcankun deepforestensemblelearningforclassificationofalignmentsofnoncodingrnasequencesbasedonmultiviewstructurerepresentations
AT hansiyu deepforestensemblelearningforclassificationofalignmentsofnoncodingrnasequencesbasedonmultiviewstructurerepresentations
AT maqin deepforestensemblelearningforclassificationofalignmentsofnoncodingrnasequencesbasedonmultiviewstructurerepresentations
AT duwei deepforestensemblelearningforclassificationofalignmentsofnoncodingrnasequencesbasedonmultiviewstructurerepresentations