Cargando…

UFold: fast and accurate RNA secondary structure prediction with deep learning

For many RNA molecules, the secondary structure is essential for the correct function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure p...

Descripción completa

Detalles Bibliográficos
Autores principales: Fu, Laiyi, Cao, Yingxin, Wu, Jie, Peng, Qinke, Nie, Qing, Xie, Xiaohui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8860580/
https://www.ncbi.nlm.nih.gov/pubmed/34792173
http://dx.doi.org/10.1093/nar/gkab1074
_version_ 1784654707692666880
author Fu, Laiyi
Cao, Yingxin
Wu, Jie
Peng, Qinke
Nie, Qing
Xie, Xiaohui
author_facet Fu, Laiyi
Cao, Yingxin
Wu, Jie
Peng, Qinke
Nie, Qing
Xie, Xiaohui
author_sort Fu, Laiyi
collection PubMed
description For many RNA molecules, the secondary structure is essential for the correct function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure prediction algorithms are primarily based on thermodynamic models through free energy minimization, which imposes strong prior assumptions and is slow to run. Here, we propose a deep learning-based method, called UFold, for RNA secondary structure prediction, trained directly on annotated data and base-pairing rules. UFold proposes a novel image-like representation of RNA sequences, which can be efficiently processed by Fully Convolutional Networks (FCNs). We benchmark the performance of UFold on both within- and cross-family RNA datasets. It significantly outperforms previous methods on within-family datasets, while achieving a similar performance as the traditional methods when trained and tested on distinct RNA families. UFold is also able to predict pseudoknots accurately. Its prediction is fast with an inference time of about 160 ms per sequence up to 1500 bp in length. An online web server running UFold is available at https://ufold.ics.uci.edu. Code is available at https://github.com/uci-cbcl/UFold.
format Online
Article
Text
id pubmed-8860580
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-88605802022-02-22 UFold: fast and accurate RNA secondary structure prediction with deep learning Fu, Laiyi Cao, Yingxin Wu, Jie Peng, Qinke Nie, Qing Xie, Xiaohui Nucleic Acids Res Methods Online For many RNA molecules, the secondary structure is essential for the correct function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure prediction algorithms are primarily based on thermodynamic models through free energy minimization, which imposes strong prior assumptions and is slow to run. Here, we propose a deep learning-based method, called UFold, for RNA secondary structure prediction, trained directly on annotated data and base-pairing rules. UFold proposes a novel image-like representation of RNA sequences, which can be efficiently processed by Fully Convolutional Networks (FCNs). We benchmark the performance of UFold on both within- and cross-family RNA datasets. It significantly outperforms previous methods on within-family datasets, while achieving a similar performance as the traditional methods when trained and tested on distinct RNA families. UFold is also able to predict pseudoknots accurately. Its prediction is fast with an inference time of about 160 ms per sequence up to 1500 bp in length. An online web server running UFold is available at https://ufold.ics.uci.edu. Code is available at https://github.com/uci-cbcl/UFold. Oxford University Press 2021-11-18 /pmc/articles/PMC8860580/ /pubmed/34792173 http://dx.doi.org/10.1093/nar/gkab1074 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Fu, Laiyi
Cao, Yingxin
Wu, Jie
Peng, Qinke
Nie, Qing
Xie, Xiaohui
UFold: fast and accurate RNA secondary structure prediction with deep learning
title UFold: fast and accurate RNA secondary structure prediction with deep learning
title_full UFold: fast and accurate RNA secondary structure prediction with deep learning
title_fullStr UFold: fast and accurate RNA secondary structure prediction with deep learning
title_full_unstemmed UFold: fast and accurate RNA secondary structure prediction with deep learning
title_short UFold: fast and accurate RNA secondary structure prediction with deep learning
title_sort ufold: fast and accurate rna secondary structure prediction with deep learning
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8860580/
https://www.ncbi.nlm.nih.gov/pubmed/34792173
http://dx.doi.org/10.1093/nar/gkab1074
work_keys_str_mv AT fulaiyi ufoldfastandaccuraternasecondarystructurepredictionwithdeeplearning
AT caoyingxin ufoldfastandaccuraternasecondarystructurepredictionwithdeeplearning
AT wujie ufoldfastandaccuraternasecondarystructurepredictionwithdeeplearning
AT pengqinke ufoldfastandaccuraternasecondarystructurepredictionwithdeeplearning
AT nieqing ufoldfastandaccuraternasecondarystructurepredictionwithdeeplearning
AT xiexiaohui ufoldfastandaccuraternasecondarystructurepredictionwithdeeplearning