Cargando…

Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars

BACKGROUND: The prediction of the structure of large RNAs remains a particular challenge in bioinformatics, due to the computational complexity and low levels of accuracy of state-of-the-art algorithms. The pfold model couples a stochastic context-free grammar to phylogenetic analysis for a high acc...

Descripción completa

Detalles Bibliográficos
Autores principales: Sükösd, Zsuzsanna, Knudsen, Bjarne, Værum, Morten, Kjems, Jørgen, Andersen, Ebbe S
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3102635/
https://www.ncbi.nlm.nih.gov/pubmed/21501497
http://dx.doi.org/10.1186/1471-2105-12-103
_version_ 1782204398914502656
author Sükösd, Zsuzsanna
Knudsen, Bjarne
Værum, Morten
Kjems, Jørgen
Andersen, Ebbe S
author_facet Sükösd, Zsuzsanna
Knudsen, Bjarne
Værum, Morten
Kjems, Jørgen
Andersen, Ebbe S
author_sort Sükösd, Zsuzsanna
collection PubMed
description BACKGROUND: The prediction of the structure of large RNAs remains a particular challenge in bioinformatics, due to the computational complexity and low levels of accuracy of state-of-the-art algorithms. The pfold model couples a stochastic context-free grammar to phylogenetic analysis for a high accuracy in predictions, but the time complexity of the algorithm and underflow errors have prevented its use for long alignments. Here we present PPfold, a multithreaded version of pfold, which is capable of predicting the structure of large RNA alignments accurately on practical timescales. RESULTS: We have distributed both the phylogenetic calculations and the inside-outside algorithm in PPfold, resulting in a significant reduction of runtime on multicore machines. We have addressed the floating-point underflow problems of pfold by implementing an extended-exponent datatype, enabling PPfold to be used for large-scale RNA structure predictions. We have also improved the user interface and portability: alongside standalone executable and Java source code of the program, PPfold is also available as a free plugin to the CLC Workbenches. We have evaluated the accuracy of PPfold using BRaliBase I tests, and demonstrated its practical use by predicting the secondary structure of an alignment of 24 complete HIV-1 genomes in 65 minutes on an 8-core machine and identifying several known structural elements in the prediction. CONCLUSIONS: PPfold is the first parallelized comparative RNA structure prediction algorithm to date. Based on the pfold model, PPfold is capable of fast, high-quality predictions of large RNA secondary structures, such as the genomes of RNA viruses or long genomic transcripts. The techniques used in the parallelization of this algorithm may be of general applicability to other bioinformatics algorithms.
format Text
id pubmed-3102635
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31026352011-05-27 Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars Sükösd, Zsuzsanna Knudsen, Bjarne Værum, Morten Kjems, Jørgen Andersen, Ebbe S BMC Bioinformatics Methodology Article BACKGROUND: The prediction of the structure of large RNAs remains a particular challenge in bioinformatics, due to the computational complexity and low levels of accuracy of state-of-the-art algorithms. The pfold model couples a stochastic context-free grammar to phylogenetic analysis for a high accuracy in predictions, but the time complexity of the algorithm and underflow errors have prevented its use for long alignments. Here we present PPfold, a multithreaded version of pfold, which is capable of predicting the structure of large RNA alignments accurately on practical timescales. RESULTS: We have distributed both the phylogenetic calculations and the inside-outside algorithm in PPfold, resulting in a significant reduction of runtime on multicore machines. We have addressed the floating-point underflow problems of pfold by implementing an extended-exponent datatype, enabling PPfold to be used for large-scale RNA structure predictions. We have also improved the user interface and portability: alongside standalone executable and Java source code of the program, PPfold is also available as a free plugin to the CLC Workbenches. We have evaluated the accuracy of PPfold using BRaliBase I tests, and demonstrated its practical use by predicting the secondary structure of an alignment of 24 complete HIV-1 genomes in 65 minutes on an 8-core machine and identifying several known structural elements in the prediction. CONCLUSIONS: PPfold is the first parallelized comparative RNA structure prediction algorithm to date. Based on the pfold model, PPfold is capable of fast, high-quality predictions of large RNA secondary structures, such as the genomes of RNA viruses or long genomic transcripts. The techniques used in the parallelization of this algorithm may be of general applicability to other bioinformatics algorithms. BioMed Central 2011-04-18 /pmc/articles/PMC3102635/ /pubmed/21501497 http://dx.doi.org/10.1186/1471-2105-12-103 Text en Copyright ©2011 Sükösd et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Sükösd, Zsuzsanna
Knudsen, Bjarne
Værum, Morten
Kjems, Jørgen
Andersen, Ebbe S
Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars
title Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars
title_full Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars
title_fullStr Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars
title_full_unstemmed Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars
title_short Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars
title_sort multithreaded comparative rna secondary structure prediction using stochastic context-free grammars
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3102635/
https://www.ncbi.nlm.nih.gov/pubmed/21501497
http://dx.doi.org/10.1186/1471-2105-12-103
work_keys_str_mv AT sukosdzsuzsanna multithreadedcomparativernasecondarystructurepredictionusingstochasticcontextfreegrammars
AT knudsenbjarne multithreadedcomparativernasecondarystructurepredictionusingstochasticcontextfreegrammars
AT værummorten multithreadedcomparativernasecondarystructurepredictionusingstochasticcontextfreegrammars
AT kjemsjørgen multithreadedcomparativernasecondarystructurepredictionusingstochasticcontextfreegrammars
AT andersenebbes multithreadedcomparativernasecondarystructurepredictionusingstochasticcontextfreegrammars