Cargando…
RNA contact prediction by data efficient deep learning
On the path to full understanding of the structure-function relationship or even design of RNA, structure prediction would offer an intriguing complement to experimental efforts. Any deep learning on RNA structure, however, is hampered by the sparsity of labeled training data. Utilizing the limited...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10482910/ https://www.ncbi.nlm.nih.gov/pubmed/37674020 http://dx.doi.org/10.1038/s42003-023-05244-9 |
_version_ | 1785102273952612352 |
---|---|
author | Taubert, Oskar von der Lehr, Fabrice Bazarova, Alina Faber, Christian Knechtges, Philipp Weiel, Marie Debus, Charlotte Coquelin, Daniel Basermann, Achim Streit, Achim Kesselheim, Stefan Götz, Markus Schug, Alexander |
author_facet | Taubert, Oskar von der Lehr, Fabrice Bazarova, Alina Faber, Christian Knechtges, Philipp Weiel, Marie Debus, Charlotte Coquelin, Daniel Basermann, Achim Streit, Achim Kesselheim, Stefan Götz, Markus Schug, Alexander |
author_sort | Taubert, Oskar |
collection | PubMed |
description | On the path to full understanding of the structure-function relationship or even design of RNA, structure prediction would offer an intriguing complement to experimental efforts. Any deep learning on RNA structure, however, is hampered by the sparsity of labeled training data. Utilizing the limited data available, we here focus on predicting spatial adjacencies ("contact maps”) as a proxy for 3D structure. Our model, BARNACLE, combines the utilization of unlabeled data through self-supervised pre-training and efficient use of the sparse labeled data through an XGBoost classifier. BARNACLE shows a considerable improvement over both the established classical baseline and a deep neural network. In order to demonstrate that our approach can be applied to tasks with similar data constraints, we show that our findings generalize to the related setting of accessible surface area prediction. |
format | Online Article Text |
id | pubmed-10482910 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-104829102023-09-08 RNA contact prediction by data efficient deep learning Taubert, Oskar von der Lehr, Fabrice Bazarova, Alina Faber, Christian Knechtges, Philipp Weiel, Marie Debus, Charlotte Coquelin, Daniel Basermann, Achim Streit, Achim Kesselheim, Stefan Götz, Markus Schug, Alexander Commun Biol Article On the path to full understanding of the structure-function relationship or even design of RNA, structure prediction would offer an intriguing complement to experimental efforts. Any deep learning on RNA structure, however, is hampered by the sparsity of labeled training data. Utilizing the limited data available, we here focus on predicting spatial adjacencies ("contact maps”) as a proxy for 3D structure. Our model, BARNACLE, combines the utilization of unlabeled data through self-supervised pre-training and efficient use of the sparse labeled data through an XGBoost classifier. BARNACLE shows a considerable improvement over both the established classical baseline and a deep neural network. In order to demonstrate that our approach can be applied to tasks with similar data constraints, we show that our findings generalize to the related setting of accessible surface area prediction. Nature Publishing Group UK 2023-09-06 /pmc/articles/PMC10482910/ /pubmed/37674020 http://dx.doi.org/10.1038/s42003-023-05244-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Taubert, Oskar von der Lehr, Fabrice Bazarova, Alina Faber, Christian Knechtges, Philipp Weiel, Marie Debus, Charlotte Coquelin, Daniel Basermann, Achim Streit, Achim Kesselheim, Stefan Götz, Markus Schug, Alexander RNA contact prediction by data efficient deep learning |
title | RNA contact prediction by data efficient deep learning |
title_full | RNA contact prediction by data efficient deep learning |
title_fullStr | RNA contact prediction by data efficient deep learning |
title_full_unstemmed | RNA contact prediction by data efficient deep learning |
title_short | RNA contact prediction by data efficient deep learning |
title_sort | rna contact prediction by data efficient deep learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10482910/ https://www.ncbi.nlm.nih.gov/pubmed/37674020 http://dx.doi.org/10.1038/s42003-023-05244-9 |
work_keys_str_mv | AT taubertoskar rnacontactpredictionbydataefficientdeeplearning AT vonderlehrfabrice rnacontactpredictionbydataefficientdeeplearning AT bazarovaalina rnacontactpredictionbydataefficientdeeplearning AT faberchristian rnacontactpredictionbydataefficientdeeplearning AT knechtgesphilipp rnacontactpredictionbydataefficientdeeplearning AT weielmarie rnacontactpredictionbydataefficientdeeplearning AT debuscharlotte rnacontactpredictionbydataefficientdeeplearning AT coquelindaniel rnacontactpredictionbydataefficientdeeplearning AT basermannachim rnacontactpredictionbydataefficientdeeplearning AT streitachim rnacontactpredictionbydataefficientdeeplearning AT kesselheimstefan rnacontactpredictionbydataefficientdeeplearning AT gotzmarkus rnacontactpredictionbydataefficientdeeplearning AT schugalexander rnacontactpredictionbydataefficientdeeplearning |