Cargando…

TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences

BACKGROUND: The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences....

Descripción completa

Detalles Bibliográficos
Autores principales:	Harmanci, Arif O, Sharma , Gaurav, Mathews, David H
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3120699/ https://www.ncbi.nlm.nih.gov/pubmed/21507242 http://dx.doi.org/10.1186/1471-2105-12-108

_version_	1782206737284071424
author	Harmanci, Arif O Sharma , Gaurav Mathews, David H
author_facet	Harmanci, Arif O Sharma , Gaurav Mathews, David H
author_sort	Harmanci, Arif O
collection	PubMed
description	BACKGROUND: The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented. RESULTS: TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a significance threshold are shown to be more accurate for TurboFold than for alternative methods that estimate base pairing probabilities. TurboFold-MEA, which uses base pairing probabilities from TurboFold in a maximum expected accuracy algorithm for secondary structure prediction, has accuracy comparable to the best performing secondary structure prediction methods. The computational and memory requirements for TurboFold are modest and, in terms of sequence length and number of sequences, scale much more favorably than joint alignment and folding algorithms. CONCLUSIONS: TurboFold is an iterative probabilistic method for predicting secondary structures for multiple RNA sequences that efficiently and accurately combines the information from the comparative analysis between sequences with the thermodynamic folding model. Unlike most other multi-sequence structure prediction methods, TurboFold does not enforce strict commonality of structures and is therefore useful for predicting structures for homologous sequences that have diverged significantly. TurboFold can be downloaded as part of the RNAstructure package at http://rna.urmc.rochester.edu.
format	Online Article Text
id	pubmed-3120699
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-31206992011-06-23 TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences Harmanci, Arif O Sharma , Gaurav Mathews, David H BMC Bioinformatics Research Article BACKGROUND: The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented. RESULTS: TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a significance threshold are shown to be more accurate for TurboFold than for alternative methods that estimate base pairing probabilities. TurboFold-MEA, which uses base pairing probabilities from TurboFold in a maximum expected accuracy algorithm for secondary structure prediction, has accuracy comparable to the best performing secondary structure prediction methods. The computational and memory requirements for TurboFold are modest and, in terms of sequence length and number of sequences, scale much more favorably than joint alignment and folding algorithms. CONCLUSIONS: TurboFold is an iterative probabilistic method for predicting secondary structures for multiple RNA sequences that efficiently and accurately combines the information from the comparative analysis between sequences with the thermodynamic folding model. Unlike most other multi-sequence structure prediction methods, TurboFold does not enforce strict commonality of structures and is therefore useful for predicting structures for homologous sequences that have diverged significantly. TurboFold can be downloaded as part of the RNAstructure package at http://rna.urmc.rochester.edu. BioMed Central 2011-04-20 /pmc/articles/PMC3120699/ /pubmed/21507242 http://dx.doi.org/10.1186/1471-2105-12-108 Text en Copyright ©2011 Harmanci et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Harmanci, Arif O Sharma , Gaurav Mathews, David H TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences
title	TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences
title_full	TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences
title_fullStr	TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences
title_full_unstemmed	TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences
title_short	TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences
title_sort	turbofold: iterative probabilistic estimation of secondary structures for multiple rna sequences
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3120699/ https://www.ncbi.nlm.nih.gov/pubmed/21507242 http://dx.doi.org/10.1186/1471-2105-12-108
work_keys_str_mv	AT harmanciarifo turbofolditerativeprobabilisticestimationofsecondarystructuresformultiplernasequences AT sharmagaurav turbofolditerativeprobabilisticestimationofsecondarystructuresformultiplernasequences AT mathewsdavidh turbofolditerativeprobabilisticestimationofsecondarystructuresformultiplernasequences

TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences

Ejemplares similares