Cargando…

Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions

Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed...

Descripción completa

Detalles Bibliográficos
Autores principales: Tataru, Paula, Sand, Andreas, Hobolth, Asger, Mailund, Thomas, Pedersen, Christian N. S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4009796/
https://www.ncbi.nlm.nih.gov/pubmed/24833225
http://dx.doi.org/10.3390/biology2041282
_version_ 1782479806646976512
author Tataru, Paula
Sand, Andreas
Hobolth, Asger
Mailund, Thomas
Pedersen, Christian N. S.
author_facet Tataru, Paula
Sand, Andreas
Hobolth, Asger
Mailund, Thomas
Pedersen, Christian N. S.
author_sort Tataru, Paula
collection PubMed
description Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed data using a decoding algorithm and analyzing the annotation to study patterns of interest. For example, given an HMM modeling genes in DNA sequences, the focus is on occurrences of genes in the annotation. In this paper, we define a pattern through a regular expression and present a restriction of three classical algorithms to take the number of occurrences of the pattern in the hidden sequence into account. We present a new algorithm to compute the distribution of the number of pattern occurrences, and we extend the two most widely used existing decoding algorithms to employ information from this distribution. We show experimentally that the expectation of the distribution of the number of pattern occurrences gives a highly accurate estimate, while the typical procedure can be biased in the sense that the identified number of pattern occurrences does not correspond to the true number. We furthermore show that using this distribution in the decoding algorithms improves the predictive power of the model.
format Online
Article
Text
id pubmed-4009796
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-40097962014-05-07 Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions Tataru, Paula Sand, Andreas Hobolth, Asger Mailund, Thomas Pedersen, Christian N. S. Biology (Basel) Article Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed data using a decoding algorithm and analyzing the annotation to study patterns of interest. For example, given an HMM modeling genes in DNA sequences, the focus is on occurrences of genes in the annotation. In this paper, we define a pattern through a regular expression and present a restriction of three classical algorithms to take the number of occurrences of the pattern in the hidden sequence into account. We present a new algorithm to compute the distribution of the number of pattern occurrences, and we extend the two most widely used existing decoding algorithms to employ information from this distribution. We show experimentally that the expectation of the distribution of the number of pattern occurrences gives a highly accurate estimate, while the typical procedure can be biased in the sense that the identified number of pattern occurrences does not correspond to the true number. We furthermore show that using this distribution in the decoding algorithms improves the predictive power of the model. MDPI 2013-11-08 /pmc/articles/PMC4009796/ /pubmed/24833225 http://dx.doi.org/10.3390/biology2041282 Text en © 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle Article
Tataru, Paula
Sand, Andreas
Hobolth, Asger
Mailund, Thomas
Pedersen, Christian N. S.
Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions
title Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions
title_full Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions
title_fullStr Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions
title_full_unstemmed Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions
title_short Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions
title_sort algorithms for hidden markov models restricted to occurrences of regular expressions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4009796/
https://www.ncbi.nlm.nih.gov/pubmed/24833225
http://dx.doi.org/10.3390/biology2041282
work_keys_str_mv AT tatarupaula algorithmsforhiddenmarkovmodelsrestrictedtooccurrencesofregularexpressions
AT sandandreas algorithmsforhiddenmarkovmodelsrestrictedtooccurrencesofregularexpressions
AT hobolthasger algorithmsforhiddenmarkovmodelsrestrictedtooccurrencesofregularexpressions
AT mailundthomas algorithmsforhiddenmarkovmodelsrestrictedtooccurrencesofregularexpressions
AT pedersenchristianns algorithmsforhiddenmarkovmodelsrestrictedtooccurrencesofregularexpressions