Cargando…

Hidden Markov Model Variants and their Application

Markov statistical methods may make it possible to develop an unsupervised learning process that can automatically identify genomic structure in prokaryotes in a comprehensive way. This approach is based on mutual information, probabilistic measures, hidden Markov models, and other purely statistica...

Descripción completa

Detalles Bibliográficos
Autor principal: Winters-Hilt, Stephen
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1683574/
https://www.ncbi.nlm.nih.gov/pubmed/17118135
http://dx.doi.org/10.1186/1471-2105-7-S2-S14
_version_ 1782131172786044928
author Winters-Hilt, Stephen
author_facet Winters-Hilt, Stephen
author_sort Winters-Hilt, Stephen
collection PubMed
description Markov statistical methods may make it possible to develop an unsupervised learning process that can automatically identify genomic structure in prokaryotes in a comprehensive way. This approach is based on mutual information, probabilistic measures, hidden Markov models, and other purely statistical inputs. This approach also provides a uniquely common ground for comparative prokaryotic genomics. The approach is an on-going effort by its nature, as a multi-pass learning process, where each round is more informed than the last, and thereby allows a shift to the more powerful methods available for supervised learning at each iteration. It is envisaged that this "bootstrap" learning process will also be useful as a knowledge discovery tool. For such an ab initio prokaryotic gene-finder to work, however, it needs a mechanism to identify critical motif structure, such as those around the start of coding or start of transcription (and then, hopefully more). For eukaryotes, even with better start-of-coding identification, parsing of eukaryotic coding regions by the HMM is still limited by the HMM's single gene assumption, as evidenced by the poor performance in alternatively spliced regions. To address these complications an approach is described to expand the states in a eukaryotic gene-predictor HMM, to operate with two layers of DNA parsing. This extension from the single layer gene prediction parse is indicated after preliminary analysis of the C. elegans alt-splice statistics. State profiles have made use of a novel hash-interpolating MM (hIMM) method. A new implementation for an HMM-with-Duration is also described, with far-reaching application to gene-structure identification and analysis of channel current blockade data.
format Text
id pubmed-1683574
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-16835742006-12-05 Hidden Markov Model Variants and their Application Winters-Hilt, Stephen BMC Bioinformatics Proceedings Markov statistical methods may make it possible to develop an unsupervised learning process that can automatically identify genomic structure in prokaryotes in a comprehensive way. This approach is based on mutual information, probabilistic measures, hidden Markov models, and other purely statistical inputs. This approach also provides a uniquely common ground for comparative prokaryotic genomics. The approach is an on-going effort by its nature, as a multi-pass learning process, where each round is more informed than the last, and thereby allows a shift to the more powerful methods available for supervised learning at each iteration. It is envisaged that this "bootstrap" learning process will also be useful as a knowledge discovery tool. For such an ab initio prokaryotic gene-finder to work, however, it needs a mechanism to identify critical motif structure, such as those around the start of coding or start of transcription (and then, hopefully more). For eukaryotes, even with better start-of-coding identification, parsing of eukaryotic coding regions by the HMM is still limited by the HMM's single gene assumption, as evidenced by the poor performance in alternatively spliced regions. To address these complications an approach is described to expand the states in a eukaryotic gene-predictor HMM, to operate with two layers of DNA parsing. This extension from the single layer gene prediction parse is indicated after preliminary analysis of the C. elegans alt-splice statistics. State profiles have made use of a novel hash-interpolating MM (hIMM) method. A new implementation for an HMM-with-Duration is also described, with far-reaching application to gene-structure identification and analysis of channel current blockade data. BioMed Central 2006-09-26 /pmc/articles/PMC1683574/ /pubmed/17118135 http://dx.doi.org/10.1186/1471-2105-7-S2-S14 Text en Copyright © 2006 Winters-Hilt; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Winters-Hilt, Stephen
Hidden Markov Model Variants and their Application
title Hidden Markov Model Variants and their Application
title_full Hidden Markov Model Variants and their Application
title_fullStr Hidden Markov Model Variants and their Application
title_full_unstemmed Hidden Markov Model Variants and their Application
title_short Hidden Markov Model Variants and their Application
title_sort hidden markov model variants and their application
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1683574/
https://www.ncbi.nlm.nih.gov/pubmed/17118135
http://dx.doi.org/10.1186/1471-2105-7-S2-S14
work_keys_str_mv AT wintershiltstephen hiddenmarkovmodelvariantsandtheirapplication