Cargando…

LazySampling and LinearSampling: fast stochastic sampling of RNA secondary structure with applications to SARS-CoV-2

Many RNAs fold into multiple structures at equilibrium, and there is a need to sample these structures according to their probabilities in the ensemble. The conventional sampling algorithm suffers from two limitations: (i) the sampling phase is slow due to many repeated calculations; and (ii) the en...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, He, Li, Sizhen, Zhang, Liang, Mathews, David H, Huang, Liang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881153/
https://www.ncbi.nlm.nih.gov/pubmed/36401871
http://dx.doi.org/10.1093/nar/gkac1029
_version_ 1784879052890308608
author Zhang, He
Li, Sizhen
Zhang, Liang
Mathews, David H
Huang, Liang
author_facet Zhang, He
Li, Sizhen
Zhang, Liang
Mathews, David H
Huang, Liang
author_sort Zhang, He
collection PubMed
description Many RNAs fold into multiple structures at equilibrium, and there is a need to sample these structures according to their probabilities in the ensemble. The conventional sampling algorithm suffers from two limitations: (i) the sampling phase is slow due to many repeated calculations; and (ii) the end-to-end runtime scales cubically with the sequence length. These issues make it difficult to be applied to long RNAs, such as the full genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). To address these problems, we devise a new sampling algorithm, LazySampling, which eliminates redundant work via on-demand caching. Based on LazySampling, we further derive LinearSampling, an end-to-end linear time sampling algorithm. Benchmarking on nine diverse RNA families, the sampled structures from LinearSampling correlate better with the well-established secondary structures than Vienna RNAsubopt and RNAplfold. More importantly, LinearSampling is orders of magnitude faster than standard tools, being 428× faster (72 s versus 8.6 h) than RNAsubopt on the full genome of SARS-CoV-2 (29 903 nt). The resulting sample landscape correlates well with the experimentally guided secondary structure models, and is closer to the alternative conformations revealed by experimentally driven analysis. Finally, LinearSampling finds 23 regions of 15 nt with high accessibilities in the SARS-CoV-2 genome, which are potential targets for COVID-19 diagnostics and therapeutics.
format Online
Article
Text
id pubmed-9881153
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98811532023-01-31 LazySampling and LinearSampling: fast stochastic sampling of RNA secondary structure with applications to SARS-CoV-2 Zhang, He Li, Sizhen Zhang, Liang Mathews, David H Huang, Liang Nucleic Acids Res Methods Online Many RNAs fold into multiple structures at equilibrium, and there is a need to sample these structures according to their probabilities in the ensemble. The conventional sampling algorithm suffers from two limitations: (i) the sampling phase is slow due to many repeated calculations; and (ii) the end-to-end runtime scales cubically with the sequence length. These issues make it difficult to be applied to long RNAs, such as the full genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). To address these problems, we devise a new sampling algorithm, LazySampling, which eliminates redundant work via on-demand caching. Based on LazySampling, we further derive LinearSampling, an end-to-end linear time sampling algorithm. Benchmarking on nine diverse RNA families, the sampled structures from LinearSampling correlate better with the well-established secondary structures than Vienna RNAsubopt and RNAplfold. More importantly, LinearSampling is orders of magnitude faster than standard tools, being 428× faster (72 s versus 8.6 h) than RNAsubopt on the full genome of SARS-CoV-2 (29 903 nt). The resulting sample landscape correlates well with the experimentally guided secondary structure models, and is closer to the alternative conformations revealed by experimentally driven analysis. Finally, LinearSampling finds 23 regions of 15 nt with high accessibilities in the SARS-CoV-2 genome, which are potential targets for COVID-19 diagnostics and therapeutics. Oxford University Press 2022-11-18 /pmc/articles/PMC9881153/ /pubmed/36401871 http://dx.doi.org/10.1093/nar/gkac1029 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Zhang, He
Li, Sizhen
Zhang, Liang
Mathews, David H
Huang, Liang
LazySampling and LinearSampling: fast stochastic sampling of RNA secondary structure with applications to SARS-CoV-2
title LazySampling and LinearSampling: fast stochastic sampling of RNA secondary structure with applications to SARS-CoV-2
title_full LazySampling and LinearSampling: fast stochastic sampling of RNA secondary structure with applications to SARS-CoV-2
title_fullStr LazySampling and LinearSampling: fast stochastic sampling of RNA secondary structure with applications to SARS-CoV-2
title_full_unstemmed LazySampling and LinearSampling: fast stochastic sampling of RNA secondary structure with applications to SARS-CoV-2
title_short LazySampling and LinearSampling: fast stochastic sampling of RNA secondary structure with applications to SARS-CoV-2
title_sort lazysampling and linearsampling: fast stochastic sampling of rna secondary structure with applications to sars-cov-2
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881153/
https://www.ncbi.nlm.nih.gov/pubmed/36401871
http://dx.doi.org/10.1093/nar/gkac1029
work_keys_str_mv AT zhanghe lazysamplingandlinearsamplingfaststochasticsamplingofrnasecondarystructurewithapplicationstosarscov2
AT lisizhen lazysamplingandlinearsamplingfaststochasticsamplingofrnasecondarystructurewithapplicationstosarscov2
AT zhangliang lazysamplingandlinearsamplingfaststochasticsamplingofrnasecondarystructurewithapplicationstosarscov2
AT mathewsdavidh lazysamplingandlinearsamplingfaststochasticsamplingofrnasecondarystructurewithapplicationstosarscov2
AT huangliang lazysamplingandlinearsamplingfaststochasticsamplingofrnasecondarystructurewithapplicationstosarscov2