Cargando…

Machine learning for RNA 2D structure prediction benchmarked on experimental data

Since the 1980s, dozens of computational methods have addressed the problem of predicting RNA secondary structure. Among them are those that follow standard optimization approaches and, more recently, machine learning (ML) algorithms. The former were repeatedly benchmarked on various datasets. The l...

Descripción completa

Detalles Bibliográficos
Autores principales:	Justyna, Marek, Antczak, Maciej, Szachniuk, Marta
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Problem Solving Protocol
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10199776/ https://www.ncbi.nlm.nih.gov/pubmed/37096592 http://dx.doi.org/10.1093/bib/bbad153

_version_	1785045002930356224
author	Justyna, Marek Antczak, Maciej Szachniuk, Marta
author_facet	Justyna, Marek Antczak, Maciej Szachniuk, Marta
author_sort	Justyna, Marek
collection	PubMed
description	Since the 1980s, dozens of computational methods have addressed the problem of predicting RNA secondary structure. Among them are those that follow standard optimization approaches and, more recently, machine learning (ML) algorithms. The former were repeatedly benchmarked on various datasets. The latter, on the other hand, have not yet undergone extensive analysis that could suggest to the user which algorithm best fits the problem to be solved. In this review, we compare 15 methods that predict the secondary structure of RNA, of which 6 are based on deep learning (DL), 3 on shallow learning (SL) and 6 control methods on non-ML approaches. We discuss the ML strategies implemented and perform three experiments in which we evaluate the prediction of (I) representatives of the RNA equivalence classes, (II) selected Rfam sequences and (III) RNAs from new Rfam families. We show that DL-based algorithms (such as SPOT-RNA and UFold) can outperform SL and traditional methods if the data distribution is similar in the training and testing set. However, when predicting 2D structures for new RNA families, the advantage of DL is no longer clear, and its performance is inferior or equal to that of SL and non-ML methods.
format	Online Article Text
id	pubmed-10199776
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-101997762023-05-21 Machine learning for RNA 2D structure prediction benchmarked on experimental data Justyna, Marek Antczak, Maciej Szachniuk, Marta Brief Bioinform Problem Solving Protocol Since the 1980s, dozens of computational methods have addressed the problem of predicting RNA secondary structure. Among them are those that follow standard optimization approaches and, more recently, machine learning (ML) algorithms. The former were repeatedly benchmarked on various datasets. The latter, on the other hand, have not yet undergone extensive analysis that could suggest to the user which algorithm best fits the problem to be solved. In this review, we compare 15 methods that predict the secondary structure of RNA, of which 6 are based on deep learning (DL), 3 on shallow learning (SL) and 6 control methods on non-ML approaches. We discuss the ML strategies implemented and perform three experiments in which we evaluate the prediction of (I) representatives of the RNA equivalence classes, (II) selected Rfam sequences and (III) RNAs from new Rfam families. We show that DL-based algorithms (such as SPOT-RNA and UFold) can outperform SL and traditional methods if the data distribution is similar in the training and testing set. However, when predicting 2D structures for new RNA families, the advantage of DL is no longer clear, and its performance is inferior or equal to that of SL and non-ML methods. Oxford University Press 2023-04-24 /pmc/articles/PMC10199776/ /pubmed/37096592 http://dx.doi.org/10.1093/bib/bbad153 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Problem Solving Protocol Justyna, Marek Antczak, Maciej Szachniuk, Marta Machine learning for RNA 2D structure prediction benchmarked on experimental data
title	Machine learning for RNA 2D structure prediction benchmarked on experimental data
title_full	Machine learning for RNA 2D structure prediction benchmarked on experimental data
title_fullStr	Machine learning for RNA 2D structure prediction benchmarked on experimental data
title_full_unstemmed	Machine learning for RNA 2D structure prediction benchmarked on experimental data
title_short	Machine learning for RNA 2D structure prediction benchmarked on experimental data
title_sort	machine learning for rna 2d structure prediction benchmarked on experimental data
topic	Problem Solving Protocol
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10199776/ https://www.ncbi.nlm.nih.gov/pubmed/37096592 http://dx.doi.org/10.1093/bib/bbad153
work_keys_str_mv	AT justynamarek machinelearningforrna2dstructurepredictionbenchmarkedonexperimentaldata AT antczakmaciej machinelearningforrna2dstructurepredictionbenchmarkedonexperimentaldata AT szachniukmarta machinelearningforrna2dstructurepredictionbenchmarkedonexperimentaldata

Machine learning for RNA 2D structure prediction benchmarked on experimental data

Ejemplares similares