Cargando…

Caveats to Deep Learning Approaches to RNA Secondary Structure Prediction

Machine learning (ML) and in particular deep learning techniques have gained popularity for predicting structures from biopolymer sequences. An interesting case is the prediction of RNA secondary structures, where well established biophysics based methods exist. The accuracy of these classical metho...

Descripción completa

Detalles Bibliográficos
Autores principales:	Flamm , Christoph, Wielach, Julia, Wolfinger, Michael T., Badelt, Stefan, Lorenz, Ronny, Hofacker, Ivo L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580944/ https://www.ncbi.nlm.nih.gov/pubmed/36304289 http://dx.doi.org/10.3389/fbinf.2022.835422

_version_	1784812506353500160
author	Flamm , Christoph Wielach, Julia Wolfinger, Michael T. Badelt, Stefan Lorenz, Ronny Hofacker, Ivo L.
author_facet	Flamm , Christoph Wielach, Julia Wolfinger, Michael T. Badelt, Stefan Lorenz, Ronny Hofacker, Ivo L.
author_sort	Flamm , Christoph
collection	PubMed
description	Machine learning (ML) and in particular deep learning techniques have gained popularity for predicting structures from biopolymer sequences. An interesting case is the prediction of RNA secondary structures, where well established biophysics based methods exist. The accuracy of these classical methods is limited due to lack of experimental parameters and certain simplifying assumptions and has seen little improvement over the last decade. This makes RNA folding an attractive target for machine learning and consequently several deep learning models have been proposed in recent years. However, for ML approaches to be competitive for de-novo structure prediction, the models must not just demonstrate good phenomenological fits, but be able to learn a (complex) biophysical model. In this contribution we discuss limitations of current approaches, in particular due to biases in the training data. Furthermore, we propose to study capabilities and limitations of ML models by first applying them on synthetic data (obtained from a simplified biophysical model) that can be generated in arbitrary amounts and where all biases can be controlled. We assume that a deep learning model that performs well on these synthetic, would also perform well on real data, and vice versa. We apply this idea by testing several ML models of varying complexity. Finally, we show that the best models are capable of capturing many, but not all, properties of RNA secondary structures. Most severely, the number of predicted base pairs scales quadratically with sequence length, even though a secondary structure can only accommodate a linear number of pairs.
format	Online Article Text
id	pubmed-9580944
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-95809442022-10-26 Caveats to Deep Learning Approaches to RNA Secondary Structure Prediction Flamm , Christoph Wielach, Julia Wolfinger, Michael T. Badelt, Stefan Lorenz, Ronny Hofacker, Ivo L. Front Bioinform Bioinformatics Machine learning (ML) and in particular deep learning techniques have gained popularity for predicting structures from biopolymer sequences. An interesting case is the prediction of RNA secondary structures, where well established biophysics based methods exist. The accuracy of these classical methods is limited due to lack of experimental parameters and certain simplifying assumptions and has seen little improvement over the last decade. This makes RNA folding an attractive target for machine learning and consequently several deep learning models have been proposed in recent years. However, for ML approaches to be competitive for de-novo structure prediction, the models must not just demonstrate good phenomenological fits, but be able to learn a (complex) biophysical model. In this contribution we discuss limitations of current approaches, in particular due to biases in the training data. Furthermore, we propose to study capabilities and limitations of ML models by first applying them on synthetic data (obtained from a simplified biophysical model) that can be generated in arbitrary amounts and where all biases can be controlled. We assume that a deep learning model that performs well on these synthetic, would also perform well on real data, and vice versa. We apply this idea by testing several ML models of varying complexity. Finally, we show that the best models are capable of capturing many, but not all, properties of RNA secondary structures. Most severely, the number of predicted base pairs scales quadratically with sequence length, even though a secondary structure can only accommodate a linear number of pairs. Frontiers Media S.A. 2022-07-11 /pmc/articles/PMC9580944/ /pubmed/36304289 http://dx.doi.org/10.3389/fbinf.2022.835422 Text en Copyright © 2022 Flamm , Wielach, Wolfinger, Badelt, Lorenz and Hofacker. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Bioinformatics Flamm , Christoph Wielach, Julia Wolfinger, Michael T. Badelt, Stefan Lorenz, Ronny Hofacker, Ivo L. Caveats to Deep Learning Approaches to RNA Secondary Structure Prediction
title	Caveats to Deep Learning Approaches to RNA Secondary Structure Prediction
title_full	Caveats to Deep Learning Approaches to RNA Secondary Structure Prediction
title_fullStr	Caveats to Deep Learning Approaches to RNA Secondary Structure Prediction
title_full_unstemmed	Caveats to Deep Learning Approaches to RNA Secondary Structure Prediction
title_short	Caveats to Deep Learning Approaches to RNA Secondary Structure Prediction
title_sort	caveats to deep learning approaches to rna secondary structure prediction
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580944/ https://www.ncbi.nlm.nih.gov/pubmed/36304289 http://dx.doi.org/10.3389/fbinf.2022.835422
work_keys_str_mv	AT flammchristoph caveatstodeeplearningapproachestornasecondarystructureprediction AT wielachjulia caveatstodeeplearningapproachestornasecondarystructureprediction AT wolfingermichaelt caveatstodeeplearningapproachestornasecondarystructureprediction AT badeltstefan caveatstodeeplearningapproachestornasecondarystructureprediction AT lorenzronny caveatstodeeplearningapproachestornasecondarystructureprediction AT hofackerivol caveatstodeeplearningapproachestornasecondarystructureprediction

Caveats to Deep Learning Approaches to RNA Secondary Structure Prediction

Ejemplares similares