Cargando…
Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions
Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4009796/ https://www.ncbi.nlm.nih.gov/pubmed/24833225 http://dx.doi.org/10.3390/biology2041282 |
_version_ | 1782479806646976512 |
---|---|
author | Tataru, Paula Sand, Andreas Hobolth, Asger Mailund, Thomas Pedersen, Christian N. S. |
author_facet | Tataru, Paula Sand, Andreas Hobolth, Asger Mailund, Thomas Pedersen, Christian N. S. |
author_sort | Tataru, Paula |
collection | PubMed |
description | Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed data using a decoding algorithm and analyzing the annotation to study patterns of interest. For example, given an HMM modeling genes in DNA sequences, the focus is on occurrences of genes in the annotation. In this paper, we define a pattern through a regular expression and present a restriction of three classical algorithms to take the number of occurrences of the pattern in the hidden sequence into account. We present a new algorithm to compute the distribution of the number of pattern occurrences, and we extend the two most widely used existing decoding algorithms to employ information from this distribution. We show experimentally that the expectation of the distribution of the number of pattern occurrences gives a highly accurate estimate, while the typical procedure can be biased in the sense that the identified number of pattern occurrences does not correspond to the true number. We furthermore show that using this distribution in the decoding algorithms improves the predictive power of the model. |
format | Online Article Text |
id | pubmed-4009796 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-40097962014-05-07 Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions Tataru, Paula Sand, Andreas Hobolth, Asger Mailund, Thomas Pedersen, Christian N. S. Biology (Basel) Article Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed data using a decoding algorithm and analyzing the annotation to study patterns of interest. For example, given an HMM modeling genes in DNA sequences, the focus is on occurrences of genes in the annotation. In this paper, we define a pattern through a regular expression and present a restriction of three classical algorithms to take the number of occurrences of the pattern in the hidden sequence into account. We present a new algorithm to compute the distribution of the number of pattern occurrences, and we extend the two most widely used existing decoding algorithms to employ information from this distribution. We show experimentally that the expectation of the distribution of the number of pattern occurrences gives a highly accurate estimate, while the typical procedure can be biased in the sense that the identified number of pattern occurrences does not correspond to the true number. We furthermore show that using this distribution in the decoding algorithms improves the predictive power of the model. MDPI 2013-11-08 /pmc/articles/PMC4009796/ /pubmed/24833225 http://dx.doi.org/10.3390/biology2041282 Text en © 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/). |
spellingShingle | Article Tataru, Paula Sand, Andreas Hobolth, Asger Mailund, Thomas Pedersen, Christian N. S. Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions |
title | Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions |
title_full | Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions |
title_fullStr | Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions |
title_full_unstemmed | Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions |
title_short | Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions |
title_sort | algorithms for hidden markov models restricted to occurrences of regular expressions |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4009796/ https://www.ncbi.nlm.nih.gov/pubmed/24833225 http://dx.doi.org/10.3390/biology2041282 |
work_keys_str_mv | AT tatarupaula algorithmsforhiddenmarkovmodelsrestrictedtooccurrencesofregularexpressions AT sandandreas algorithmsforhiddenmarkovmodelsrestrictedtooccurrencesofregularexpressions AT hobolthasger algorithmsforhiddenmarkovmodelsrestrictedtooccurrencesofregularexpressions AT mailundthomas algorithmsforhiddenmarkovmodelsrestrictedtooccurrencesofregularexpressions AT pedersenchristianns algorithmsforhiddenmarkovmodelsrestrictedtooccurrencesofregularexpressions |