Cargando…

ATTfold: RNA Secondary Structure Prediction With Pseudoknots Based on Attention Mechanism

Accurate RNA secondary structure information is the cornerstone of gene function research and RNA tertiary structure prediction. However, most traditional RNA secondary structure prediction algorithms are based on the dynamic programming (DP) algorithm, according to the minimum free energy theory, w...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yili, Liu, Yuanning, Wang, Shuo, Liu, Zhen, Gao, Yubing, Zhang, Hao, Dong, Liyan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7770172/
https://www.ncbi.nlm.nih.gov/pubmed/33384721
http://dx.doi.org/10.3389/fgene.2020.612086
_version_ 1783629451081285632
author Wang, Yili
Liu, Yuanning
Wang, Shuo
Liu, Zhen
Gao, Yubing
Zhang, Hao
Dong, Liyan
author_facet Wang, Yili
Liu, Yuanning
Wang, Shuo
Liu, Zhen
Gao, Yubing
Zhang, Hao
Dong, Liyan
author_sort Wang, Yili
collection PubMed
description Accurate RNA secondary structure information is the cornerstone of gene function research and RNA tertiary structure prediction. However, most traditional RNA secondary structure prediction algorithms are based on the dynamic programming (DP) algorithm, according to the minimum free energy theory, with both hard and soft constraints. The accuracy is particularly dependent on the accuracy of soft constraints (from experimental data like chemical and enzyme detection). With the elongation of the RNA sequence, the time complexity of DP-based algorithms will increase geometrically, as a result, they are not good at coping with relatively long sequences. Furthermore, due to the complexity of the pseudoknots structure, the secondary structure prediction method, based on traditional algorithms, has great defects which cannot predict the secondary structure with pseudoknots well. Therefore, few algorithms have been available for pseudoknots prediction in the past. The ATTfold algorithm proposed in this article is a deep learning algorithm based on an attention mechanism. It analyzes the global information of the RNA sequence via the characteristics of the attention mechanism, focuses on the correlation between paired bases, and solves the problem of long sequence prediction. Moreover, this algorithm also extracts the effective multi-dimensional features from a great number of RNA sequences and structure information, by combining the exclusive hard constraints of RNA secondary structure. Hence, it accurately determines the pairing position of each base, and obtains the real and effective RNA secondary structure, including pseudoknots. Finally, after training the ATTfold algorithm model through tens of thousands of RNA sequences and their real secondary structures, this algorithm was compared with four classic RNA secondary structure prediction algorithms. The results show that our algorithm significantly outperforms others and more accurately showed the secondary structure of RNA. As the data in RNA sequence databases increase, our deep learning-based algorithm will have superior performance. In the future, this kind of algorithm will be more indispensable.
format Online
Article
Text
id pubmed-7770172
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-77701722020-12-30 ATTfold: RNA Secondary Structure Prediction With Pseudoknots Based on Attention Mechanism Wang, Yili Liu, Yuanning Wang, Shuo Liu, Zhen Gao, Yubing Zhang, Hao Dong, Liyan Front Genet Genetics Accurate RNA secondary structure information is the cornerstone of gene function research and RNA tertiary structure prediction. However, most traditional RNA secondary structure prediction algorithms are based on the dynamic programming (DP) algorithm, according to the minimum free energy theory, with both hard and soft constraints. The accuracy is particularly dependent on the accuracy of soft constraints (from experimental data like chemical and enzyme detection). With the elongation of the RNA sequence, the time complexity of DP-based algorithms will increase geometrically, as a result, they are not good at coping with relatively long sequences. Furthermore, due to the complexity of the pseudoknots structure, the secondary structure prediction method, based on traditional algorithms, has great defects which cannot predict the secondary structure with pseudoknots well. Therefore, few algorithms have been available for pseudoknots prediction in the past. The ATTfold algorithm proposed in this article is a deep learning algorithm based on an attention mechanism. It analyzes the global information of the RNA sequence via the characteristics of the attention mechanism, focuses on the correlation between paired bases, and solves the problem of long sequence prediction. Moreover, this algorithm also extracts the effective multi-dimensional features from a great number of RNA sequences and structure information, by combining the exclusive hard constraints of RNA secondary structure. Hence, it accurately determines the pairing position of each base, and obtains the real and effective RNA secondary structure, including pseudoknots. Finally, after training the ATTfold algorithm model through tens of thousands of RNA sequences and their real secondary structures, this algorithm was compared with four classic RNA secondary structure prediction algorithms. The results show that our algorithm significantly outperforms others and more accurately showed the secondary structure of RNA. As the data in RNA sequence databases increase, our deep learning-based algorithm will have superior performance. In the future, this kind of algorithm will be more indispensable. Frontiers Media S.A. 2020-12-15 /pmc/articles/PMC7770172/ /pubmed/33384721 http://dx.doi.org/10.3389/fgene.2020.612086 Text en Copyright © 2020 Wang, Liu, Wang, Liu, Gao, Zhang and Dong. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Wang, Yili
Liu, Yuanning
Wang, Shuo
Liu, Zhen
Gao, Yubing
Zhang, Hao
Dong, Liyan
ATTfold: RNA Secondary Structure Prediction With Pseudoknots Based on Attention Mechanism
title ATTfold: RNA Secondary Structure Prediction With Pseudoknots Based on Attention Mechanism
title_full ATTfold: RNA Secondary Structure Prediction With Pseudoknots Based on Attention Mechanism
title_fullStr ATTfold: RNA Secondary Structure Prediction With Pseudoknots Based on Attention Mechanism
title_full_unstemmed ATTfold: RNA Secondary Structure Prediction With Pseudoknots Based on Attention Mechanism
title_short ATTfold: RNA Secondary Structure Prediction With Pseudoknots Based on Attention Mechanism
title_sort attfold: rna secondary structure prediction with pseudoknots based on attention mechanism
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7770172/
https://www.ncbi.nlm.nih.gov/pubmed/33384721
http://dx.doi.org/10.3389/fgene.2020.612086
work_keys_str_mv AT wangyili attfoldrnasecondarystructurepredictionwithpseudoknotsbasedonattentionmechanism
AT liuyuanning attfoldrnasecondarystructurepredictionwithpseudoknotsbasedonattentionmechanism
AT wangshuo attfoldrnasecondarystructurepredictionwithpseudoknotsbasedonattentionmechanism
AT liuzhen attfoldrnasecondarystructurepredictionwithpseudoknotsbasedonattentionmechanism
AT gaoyubing attfoldrnasecondarystructurepredictionwithpseudoknotsbasedonattentionmechanism
AT zhanghao attfoldrnasecondarystructurepredictionwithpseudoknotsbasedonattentionmechanism
AT dongliyan attfoldrnasecondarystructurepredictionwithpseudoknotsbasedonattentionmechanism