Cargando…

Research on RNA secondary structure predicting via bidirectional recurrent neural network

BACKGROUND: RNA secondary structure prediction is an important research content in the field of biological information. Predicting RNA secondary structure with pseudoknots has been proved to be an NP-hard problem. Traditional machine learning methods can not effectively apply protein sequence inform...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Weizhong, Cao, Yan, Wu, Hongjie, Ding, Yijie, Song, Zhengwei, Zhang, Yu, Fu, Qiming, Li, Haiou
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8427827/
https://www.ncbi.nlm.nih.gov/pubmed/34496763
http://dx.doi.org/10.1186/s12859-021-04332-z
_version_ 1783750253106692096
author Lu, Weizhong
Cao, Yan
Wu, Hongjie
Ding, Yijie
Song, Zhengwei
Zhang, Yu
Fu, Qiming
Li, Haiou
author_facet Lu, Weizhong
Cao, Yan
Wu, Hongjie
Ding, Yijie
Song, Zhengwei
Zhang, Yu
Fu, Qiming
Li, Haiou
author_sort Lu, Weizhong
collection PubMed
description BACKGROUND: RNA secondary structure prediction is an important research content in the field of biological information. Predicting RNA secondary structure with pseudoknots has been proved to be an NP-hard problem. Traditional machine learning methods can not effectively apply protein sequence information with different sequence lengths to the prediction process due to the constraint of the self model when predicting the RNA secondary structure. In addition, there is a large difference between the number of paired bases and the number of unpaired bases in the RNA sequences, which means the problem of positive and negative sample imbalance is easy to make the model fall into a local optimum. To solve the above problems, this paper proposes a variable-length dynamic bidirectional Gated Recurrent Unit(VLDB GRU) model. The model can accept sequences with different lengths through the introduction of flag vector. The model can also make full use of the base information before and after the predicted base and can avoid losing part of the information due to truncation. Introducing a weight vector to predict the RNA training set by dynamically adjusting each base loss function solves the problem of balanced sample imbalance. RESULTS: The algorithm proposed in this paper is compared with the existing algorithms on five representative subsets of the data set RNA STRAND. The experimental results show that the accuracy and Matthews correlation coefficient of the method are improved by 4.7% and 11.4%, respectively. CONCLUSIONS: The flag vector introduced allows the model to effectively use the information before and after the protein sequence; the introduced weight vector solves the problem of unbalanced sample balance. Compared with other algorithms, the LVDB GRU algorithm proposed in this paper has the best detection results.
format Online
Article
Text
id pubmed-8427827
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-84278272021-09-10 Research on RNA secondary structure predicting via bidirectional recurrent neural network Lu, Weizhong Cao, Yan Wu, Hongjie Ding, Yijie Song, Zhengwei Zhang, Yu Fu, Qiming Li, Haiou BMC Bioinformatics Research BACKGROUND: RNA secondary structure prediction is an important research content in the field of biological information. Predicting RNA secondary structure with pseudoknots has been proved to be an NP-hard problem. Traditional machine learning methods can not effectively apply protein sequence information with different sequence lengths to the prediction process due to the constraint of the self model when predicting the RNA secondary structure. In addition, there is a large difference between the number of paired bases and the number of unpaired bases in the RNA sequences, which means the problem of positive and negative sample imbalance is easy to make the model fall into a local optimum. To solve the above problems, this paper proposes a variable-length dynamic bidirectional Gated Recurrent Unit(VLDB GRU) model. The model can accept sequences with different lengths through the introduction of flag vector. The model can also make full use of the base information before and after the predicted base and can avoid losing part of the information due to truncation. Introducing a weight vector to predict the RNA training set by dynamically adjusting each base loss function solves the problem of balanced sample imbalance. RESULTS: The algorithm proposed in this paper is compared with the existing algorithms on five representative subsets of the data set RNA STRAND. The experimental results show that the accuracy and Matthews correlation coefficient of the method are improved by 4.7% and 11.4%, respectively. CONCLUSIONS: The flag vector introduced allows the model to effectively use the information before and after the protein sequence; the introduced weight vector solves the problem of unbalanced sample balance. Compared with other algorithms, the LVDB GRU algorithm proposed in this paper has the best detection results. BioMed Central 2021-09-08 /pmc/articles/PMC8427827/ /pubmed/34496763 http://dx.doi.org/10.1186/s12859-021-04332-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Lu, Weizhong
Cao, Yan
Wu, Hongjie
Ding, Yijie
Song, Zhengwei
Zhang, Yu
Fu, Qiming
Li, Haiou
Research on RNA secondary structure predicting via bidirectional recurrent neural network
title Research on RNA secondary structure predicting via bidirectional recurrent neural network
title_full Research on RNA secondary structure predicting via bidirectional recurrent neural network
title_fullStr Research on RNA secondary structure predicting via bidirectional recurrent neural network
title_full_unstemmed Research on RNA secondary structure predicting via bidirectional recurrent neural network
title_short Research on RNA secondary structure predicting via bidirectional recurrent neural network
title_sort research on rna secondary structure predicting via bidirectional recurrent neural network
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8427827/
https://www.ncbi.nlm.nih.gov/pubmed/34496763
http://dx.doi.org/10.1186/s12859-021-04332-z
work_keys_str_mv AT luweizhong researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork
AT caoyan researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork
AT wuhongjie researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork
AT dingyijie researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork
AT songzhengwei researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork
AT zhangyu researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork
AT fuqiming researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork
AT lihaiou researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork