Cargando…

Hidden Markov Model Variants and their Application

Markov statistical methods may make it possible to develop an unsupervised learning process that can automatically identify genomic structure in prokaryotes in a comprehensive way. This approach is based on mutual information, probabilistic measures, hidden Markov models, and other purely statistica...

Descripción completa

Detalles Bibliográficos
Autor principal:	Winters-Hilt, Stephen
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1683574/ https://www.ncbi.nlm.nih.gov/pubmed/17118135 http://dx.doi.org/10.1186/1471-2105-7-S2-S14

_version_	1782131172786044928
author	Winters-Hilt, Stephen
author_facet	Winters-Hilt, Stephen
author_sort	Winters-Hilt, Stephen
collection	PubMed
description	Markov statistical methods may make it possible to develop an unsupervised learning process that can automatically identify genomic structure in prokaryotes in a comprehensive way. This approach is based on mutual information, probabilistic measures, hidden Markov models, and other purely statistical inputs. This approach also provides a uniquely common ground for comparative prokaryotic genomics. The approach is an on-going effort by its nature, as a multi-pass learning process, where each round is more informed than the last, and thereby allows a shift to the more powerful methods available for supervised learning at each iteration. It is envisaged that this "bootstrap" learning process will also be useful as a knowledge discovery tool. For such an ab initio prokaryotic gene-finder to work, however, it needs a mechanism to identify critical motif structure, such as those around the start of coding or start of transcription (and then, hopefully more). For eukaryotes, even with better start-of-coding identification, parsing of eukaryotic coding regions by the HMM is still limited by the HMM's single gene assumption, as evidenced by the poor performance in alternatively spliced regions. To address these complications an approach is described to expand the states in a eukaryotic gene-predictor HMM, to operate with two layers of DNA parsing. This extension from the single layer gene prediction parse is indicated after preliminary analysis of the C. elegans alt-splice statistics. State profiles have made use of a novel hash-interpolating MM (hIMM) method. A new implementation for an HMM-with-Duration is also described, with far-reaching application to gene-structure identification and analysis of channel current blockade data.
format	Text
id	pubmed-1683574
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-16835742006-12-05 Hidden Markov Model Variants and their Application Winters-Hilt, Stephen BMC Bioinformatics Proceedings Markov statistical methods may make it possible to develop an unsupervised learning process that can automatically identify genomic structure in prokaryotes in a comprehensive way. This approach is based on mutual information, probabilistic measures, hidden Markov models, and other purely statistical inputs. This approach also provides a uniquely common ground for comparative prokaryotic genomics. The approach is an on-going effort by its nature, as a multi-pass learning process, where each round is more informed than the last, and thereby allows a shift to the more powerful methods available for supervised learning at each iteration. It is envisaged that this "bootstrap" learning process will also be useful as a knowledge discovery tool. For such an ab initio prokaryotic gene-finder to work, however, it needs a mechanism to identify critical motif structure, such as those around the start of coding or start of transcription (and then, hopefully more). For eukaryotes, even with better start-of-coding identification, parsing of eukaryotic coding regions by the HMM is still limited by the HMM's single gene assumption, as evidenced by the poor performance in alternatively spliced regions. To address these complications an approach is described to expand the states in a eukaryotic gene-predictor HMM, to operate with two layers of DNA parsing. This extension from the single layer gene prediction parse is indicated after preliminary analysis of the C. elegans alt-splice statistics. State profiles have made use of a novel hash-interpolating MM (hIMM) method. A new implementation for an HMM-with-Duration is also described, with far-reaching application to gene-structure identification and analysis of channel current blockade data. BioMed Central 2006-09-26 /pmc/articles/PMC1683574/ /pubmed/17118135 http://dx.doi.org/10.1186/1471-2105-7-S2-S14 Text en Copyright © 2006 Winters-Hilt; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Winters-Hilt, Stephen Hidden Markov Model Variants and their Application
title	Hidden Markov Model Variants and their Application
title_full	Hidden Markov Model Variants and their Application
title_fullStr	Hidden Markov Model Variants and their Application
title_full_unstemmed	Hidden Markov Model Variants and their Application
title_short	Hidden Markov Model Variants and their Application
title_sort	hidden markov model variants and their application
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1683574/ https://www.ncbi.nlm.nih.gov/pubmed/17118135 http://dx.doi.org/10.1186/1471-2105-7-S2-S14
work_keys_str_mv	AT wintershiltstephen hiddenmarkovmodelvariantsandtheirapplication

Hidden Markov Model Variants and their Application

Ejemplares similares