Cargando…
Evolutionary Triplet Models of Structured RNA
The reconstruction and synthesis of ancestral RNAs is a feasible goal for paleogenetics. This will require new bioinformatics methods, including a robust statistical framework for reconstructing histories of substitutions, indels and structural changes. We describe a “transducer composition” algorit...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2725318/ https://www.ncbi.nlm.nih.gov/pubmed/19714212 http://dx.doi.org/10.1371/journal.pcbi.1000483 |
_version_ | 1782170509045137408 |
---|---|
author | Bradley, Robert K. Holmes, Ian |
author_facet | Bradley, Robert K. Holmes, Ian |
author_sort | Bradley, Robert K. |
collection | PubMed |
description | The reconstruction and synthesis of ancestral RNAs is a feasible goal for paleogenetics. This will require new bioinformatics methods, including a robust statistical framework for reconstructing histories of substitutions, indels and structural changes. We describe a “transducer composition” algorithm for extending pairwise probabilistic models of RNA structural evolution to models of multiple sequences related by a phylogenetic tree. This algorithm draws on formal models of computational linguistics as well as the 1985 protosequence algorithm of David Sankoff. The output of the composition algorithm is a multiple-sequence stochastic context-free grammar. We describe dynamic programming algorithms, which are robust to null cycles and empty bifurcations, for parsing this grammar. Example applications include structural alignment of non-coding RNAs, propagation of structural information from an experimentally-characterized sequence to its homologs, and inference of the ancestral structure of a set of diverged RNAs. We implemented the above algorithms for a simple model of pairwise RNA structural evolution; in particular, the algorithms for maximum likelihood (ML) alignment of three known RNA structures and a known phylogeny and inference of the common ancestral structure. We compared this ML algorithm to a variety of related, but simpler, techniques, including ML alignment algorithms for simpler models that omitted various aspects of the full model and also a posterior-decoding alignment algorithm for one of the simpler models. In our tests, incorporation of basepair structure was the most important factor for accurate alignment inference; appropriate use of posterior-decoding was next; and fine details of the model were least important. Posterior-decoding heuristics can be substantially faster than exact phylogenetic inference, so this motivates the use of sum-over-pairs heuristics where possible (and approximate sum-over-pairs). For more exact probabilistic inference, we discuss the use of transducer composition for ML (or MCMC) inference on phylogenies, including possible ways to make the core operations tractable. |
format | Text |
id | pubmed-2725318 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-27253182009-08-28 Evolutionary Triplet Models of Structured RNA Bradley, Robert K. Holmes, Ian PLoS Comput Biol Research Article The reconstruction and synthesis of ancestral RNAs is a feasible goal for paleogenetics. This will require new bioinformatics methods, including a robust statistical framework for reconstructing histories of substitutions, indels and structural changes. We describe a “transducer composition” algorithm for extending pairwise probabilistic models of RNA structural evolution to models of multiple sequences related by a phylogenetic tree. This algorithm draws on formal models of computational linguistics as well as the 1985 protosequence algorithm of David Sankoff. The output of the composition algorithm is a multiple-sequence stochastic context-free grammar. We describe dynamic programming algorithms, which are robust to null cycles and empty bifurcations, for parsing this grammar. Example applications include structural alignment of non-coding RNAs, propagation of structural information from an experimentally-characterized sequence to its homologs, and inference of the ancestral structure of a set of diverged RNAs. We implemented the above algorithms for a simple model of pairwise RNA structural evolution; in particular, the algorithms for maximum likelihood (ML) alignment of three known RNA structures and a known phylogeny and inference of the common ancestral structure. We compared this ML algorithm to a variety of related, but simpler, techniques, including ML alignment algorithms for simpler models that omitted various aspects of the full model and also a posterior-decoding alignment algorithm for one of the simpler models. In our tests, incorporation of basepair structure was the most important factor for accurate alignment inference; appropriate use of posterior-decoding was next; and fine details of the model were least important. Posterior-decoding heuristics can be substantially faster than exact phylogenetic inference, so this motivates the use of sum-over-pairs heuristics where possible (and approximate sum-over-pairs). For more exact probabilistic inference, we discuss the use of transducer composition for ML (or MCMC) inference on phylogenies, including possible ways to make the core operations tractable. Public Library of Science 2009-08-28 /pmc/articles/PMC2725318/ /pubmed/19714212 http://dx.doi.org/10.1371/journal.pcbi.1000483 Text en Bradley, Holmes. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Bradley, Robert K. Holmes, Ian Evolutionary Triplet Models of Structured RNA |
title | Evolutionary Triplet Models of Structured RNA |
title_full | Evolutionary Triplet Models of Structured RNA |
title_fullStr | Evolutionary Triplet Models of Structured RNA |
title_full_unstemmed | Evolutionary Triplet Models of Structured RNA |
title_short | Evolutionary Triplet Models of Structured RNA |
title_sort | evolutionary triplet models of structured rna |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2725318/ https://www.ncbi.nlm.nih.gov/pubmed/19714212 http://dx.doi.org/10.1371/journal.pcbi.1000483 |
work_keys_str_mv | AT bradleyrobertk evolutionarytripletmodelsofstructuredrna AT holmesian evolutionarytripletmodelsofstructuredrna |