Cargando…

Deep learning models for RNA secondary structure prediction (probably) do not generalize across families

MOTIVATION: The secondary structure of RNA is of importance to its function. Over the last few years, several papers attempted to use machine learning to improve de novo RNA secondary structure prediction. Many of these papers report impressive results for intra-family predictions but seldom address...

Descripción completa

Detalles Bibliográficos
Autores principales:	Szikszai, Marcell, Wise, Michael, Datta, Amitava, Ward, Max, Mathews, David H
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9364374/ https://www.ncbi.nlm.nih.gov/pubmed/35748706 http://dx.doi.org/10.1093/bioinformatics/btac415

_version_	1784765133886586880
author	Szikszai, Marcell Wise, Michael Datta, Amitava Ward, Max Mathews, David H
author_facet	Szikszai, Marcell Wise, Michael Datta, Amitava Ward, Max Mathews, David H
author_sort	Szikszai, Marcell
collection	PubMed
description	MOTIVATION: The secondary structure of RNA is of importance to its function. Over the last few years, several papers attempted to use machine learning to improve de novo RNA secondary structure prediction. Many of these papers report impressive results for intra-family predictions but seldom address the much more difficult (and practical) inter-family problem. RESULTS: We demonstrate that it is nearly trivial with convolutional neural networks to generate pseudo-free energy changes, modelled after structure mapping data that improve the accuracy of structure prediction for intra-family cases. We propose a more rigorous method for inter-family cross-validation that can be used to assess the performance of learning-based models. Using this method, we further demonstrate that intra-family performance is insufficient proof of generalization despite the widespread assumption in the literature and provide strong evidence that many existing learning-based models have not generalized inter-family. AVAILABILITY AND IMPLEMENTATION: Source code and data are available at https://github.com/marcellszi/dl-rna. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-9364374
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-93643742022-08-11 Deep learning models for RNA secondary structure prediction (probably) do not generalize across families Szikszai, Marcell Wise, Michael Datta, Amitava Ward, Max Mathews, David H Bioinformatics Original Papers MOTIVATION: The secondary structure of RNA is of importance to its function. Over the last few years, several papers attempted to use machine learning to improve de novo RNA secondary structure prediction. Many of these papers report impressive results for intra-family predictions but seldom address the much more difficult (and practical) inter-family problem. RESULTS: We demonstrate that it is nearly trivial with convolutional neural networks to generate pseudo-free energy changes, modelled after structure mapping data that improve the accuracy of structure prediction for intra-family cases. We propose a more rigorous method for inter-family cross-validation that can be used to assess the performance of learning-based models. Using this method, we further demonstrate that intra-family performance is insufficient proof of generalization despite the widespread assumption in the literature and provide strong evidence that many existing learning-based models have not generalized inter-family. AVAILABILITY AND IMPLEMENTATION: Source code and data are available at https://github.com/marcellszi/dl-rna. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-06-24 /pmc/articles/PMC9364374/ /pubmed/35748706 http://dx.doi.org/10.1093/bioinformatics/btac415 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Szikszai, Marcell Wise, Michael Datta, Amitava Ward, Max Mathews, David H Deep learning models for RNA secondary structure prediction (probably) do not generalize across families
title	Deep learning models for RNA secondary structure prediction (probably) do not generalize across families
title_full	Deep learning models for RNA secondary structure prediction (probably) do not generalize across families
title_fullStr	Deep learning models for RNA secondary structure prediction (probably) do not generalize across families
title_full_unstemmed	Deep learning models for RNA secondary structure prediction (probably) do not generalize across families
title_short	Deep learning models for RNA secondary structure prediction (probably) do not generalize across families
title_sort	deep learning models for rna secondary structure prediction (probably) do not generalize across families
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9364374/ https://www.ncbi.nlm.nih.gov/pubmed/35748706 http://dx.doi.org/10.1093/bioinformatics/btac415
work_keys_str_mv	AT szikszaimarcell deeplearningmodelsforrnasecondarystructurepredictionprobablydonotgeneralizeacrossfamilies AT wisemichael deeplearningmodelsforrnasecondarystructurepredictionprobablydonotgeneralizeacrossfamilies AT dattaamitava deeplearningmodelsforrnasecondarystructurepredictionprobablydonotgeneralizeacrossfamilies AT wardmax deeplearningmodelsforrnasecondarystructurepredictionprobablydonotgeneralizeacrossfamilies AT mathewsdavidh deeplearningmodelsforrnasecondarystructurepredictionprobablydonotgeneralizeacrossfamilies

Deep learning models for RNA secondary structure prediction (probably) do not generalize across families

Ejemplares similares