Cargando…

Hidden Markov Models for Evolution and Comparative Genomics Analysis

The problem of reconstruction of ancestral states given a phylogeny and data from extant species arises in a wide range of biological studies. The continuous-time Markov model for the discrete states evolution is generally used for the reconstruction of ancestral states. We modify this model to acco...

Descripción completa

Detalles Bibliográficos
Autores principales: Bykova, Nadezda A., Favorov, Alexander V., Mironov, Andrey A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3676395/
https://www.ncbi.nlm.nih.gov/pubmed/23762278
http://dx.doi.org/10.1371/journal.pone.0065012
_version_ 1782272633011699712
author Bykova, Nadezda A.
Favorov, Alexander V.
Mironov, Andrey A.
author_facet Bykova, Nadezda A.
Favorov, Alexander V.
Mironov, Andrey A.
author_sort Bykova, Nadezda A.
collection PubMed
description The problem of reconstruction of ancestral states given a phylogeny and data from extant species arises in a wide range of biological studies. The continuous-time Markov model for the discrete states evolution is generally used for the reconstruction of ancestral states. We modify this model to account for a case when the states of the extant species are uncertain. This situation appears, for example, if the states for extant species are predicted by some program and thus are known only with some level of reliability; it is common for bioinformatics field. The main idea is formulation of the problem as a hidden Markov model on a tree (tree HMM, tHMM), where the basic continuous-time Markov model is expanded with the introduction of emission probabilities of observed data (e.g. prediction scores) for each underlying discrete state. Our tHMM decoding algorithm allows us to predict states at the ancestral nodes as well as to refine states at the leaves on the basis of quantitative comparative genomics. The test on the simulated data shows that the tHMM approach applied to the continuous variable reflecting the probabilities of the states (i.e. prediction score) appears to be more accurate then the reconstruction from the discrete states assignment defined by the best score threshold. We provide examples of applying our model to the evolutionary analysis of N-terminal signal peptides and transcription factor binding sites in bacteria. The program is freely available at http://bioinf.fbb.msu.ru/~nadya/tHMM and via web-service at http://bioinf.fbb.msu.ru/treehmmweb.
format Online
Article
Text
id pubmed-3676395
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36763952013-06-12 Hidden Markov Models for Evolution and Comparative Genomics Analysis Bykova, Nadezda A. Favorov, Alexander V. Mironov, Andrey A. PLoS One Research Article The problem of reconstruction of ancestral states given a phylogeny and data from extant species arises in a wide range of biological studies. The continuous-time Markov model for the discrete states evolution is generally used for the reconstruction of ancestral states. We modify this model to account for a case when the states of the extant species are uncertain. This situation appears, for example, if the states for extant species are predicted by some program and thus are known only with some level of reliability; it is common for bioinformatics field. The main idea is formulation of the problem as a hidden Markov model on a tree (tree HMM, tHMM), where the basic continuous-time Markov model is expanded with the introduction of emission probabilities of observed data (e.g. prediction scores) for each underlying discrete state. Our tHMM decoding algorithm allows us to predict states at the ancestral nodes as well as to refine states at the leaves on the basis of quantitative comparative genomics. The test on the simulated data shows that the tHMM approach applied to the continuous variable reflecting the probabilities of the states (i.e. prediction score) appears to be more accurate then the reconstruction from the discrete states assignment defined by the best score threshold. We provide examples of applying our model to the evolutionary analysis of N-terminal signal peptides and transcription factor binding sites in bacteria. The program is freely available at http://bioinf.fbb.msu.ru/~nadya/tHMM and via web-service at http://bioinf.fbb.msu.ru/treehmmweb. Public Library of Science 2013-06-07 /pmc/articles/PMC3676395/ /pubmed/23762278 http://dx.doi.org/10.1371/journal.pone.0065012 Text en © 2013 Bykova et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Bykova, Nadezda A.
Favorov, Alexander V.
Mironov, Andrey A.
Hidden Markov Models for Evolution and Comparative Genomics Analysis
title Hidden Markov Models for Evolution and Comparative Genomics Analysis
title_full Hidden Markov Models for Evolution and Comparative Genomics Analysis
title_fullStr Hidden Markov Models for Evolution and Comparative Genomics Analysis
title_full_unstemmed Hidden Markov Models for Evolution and Comparative Genomics Analysis
title_short Hidden Markov Models for Evolution and Comparative Genomics Analysis
title_sort hidden markov models for evolution and comparative genomics analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3676395/
https://www.ncbi.nlm.nih.gov/pubmed/23762278
http://dx.doi.org/10.1371/journal.pone.0065012
work_keys_str_mv AT bykovanadezdaa hiddenmarkovmodelsforevolutionandcomparativegenomicsanalysis
AT favorovalexanderv hiddenmarkovmodelsforevolutionandcomparativegenomicsanalysis
AT mironovandreya hiddenmarkovmodelsforevolutionandcomparativegenomicsanalysis