Cargando…
Hidden Markov Model Variants and their Application
Markov statistical methods may make it possible to develop an unsupervised learning process that can automatically identify genomic structure in prokaryotes in a comprehensive way. This approach is based on mutual information, probabilistic measures, hidden Markov models, and other purely statistica...
Autor principal: | |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1683574/ https://www.ncbi.nlm.nih.gov/pubmed/17118135 http://dx.doi.org/10.1186/1471-2105-7-S2-S14 |
_version_ | 1782131172786044928 |
---|---|
author | Winters-Hilt, Stephen |
author_facet | Winters-Hilt, Stephen |
author_sort | Winters-Hilt, Stephen |
collection | PubMed |
description | Markov statistical methods may make it possible to develop an unsupervised learning process that can automatically identify genomic structure in prokaryotes in a comprehensive way. This approach is based on mutual information, probabilistic measures, hidden Markov models, and other purely statistical inputs. This approach also provides a uniquely common ground for comparative prokaryotic genomics. The approach is an on-going effort by its nature, as a multi-pass learning process, where each round is more informed than the last, and thereby allows a shift to the more powerful methods available for supervised learning at each iteration. It is envisaged that this "bootstrap" learning process will also be useful as a knowledge discovery tool. For such an ab initio prokaryotic gene-finder to work, however, it needs a mechanism to identify critical motif structure, such as those around the start of coding or start of transcription (and then, hopefully more). For eukaryotes, even with better start-of-coding identification, parsing of eukaryotic coding regions by the HMM is still limited by the HMM's single gene assumption, as evidenced by the poor performance in alternatively spliced regions. To address these complications an approach is described to expand the states in a eukaryotic gene-predictor HMM, to operate with two layers of DNA parsing. This extension from the single layer gene prediction parse is indicated after preliminary analysis of the C. elegans alt-splice statistics. State profiles have made use of a novel hash-interpolating MM (hIMM) method. A new implementation for an HMM-with-Duration is also described, with far-reaching application to gene-structure identification and analysis of channel current blockade data. |
format | Text |
id | pubmed-1683574 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-16835742006-12-05 Hidden Markov Model Variants and their Application Winters-Hilt, Stephen BMC Bioinformatics Proceedings Markov statistical methods may make it possible to develop an unsupervised learning process that can automatically identify genomic structure in prokaryotes in a comprehensive way. This approach is based on mutual information, probabilistic measures, hidden Markov models, and other purely statistical inputs. This approach also provides a uniquely common ground for comparative prokaryotic genomics. The approach is an on-going effort by its nature, as a multi-pass learning process, where each round is more informed than the last, and thereby allows a shift to the more powerful methods available for supervised learning at each iteration. It is envisaged that this "bootstrap" learning process will also be useful as a knowledge discovery tool. For such an ab initio prokaryotic gene-finder to work, however, it needs a mechanism to identify critical motif structure, such as those around the start of coding or start of transcription (and then, hopefully more). For eukaryotes, even with better start-of-coding identification, parsing of eukaryotic coding regions by the HMM is still limited by the HMM's single gene assumption, as evidenced by the poor performance in alternatively spliced regions. To address these complications an approach is described to expand the states in a eukaryotic gene-predictor HMM, to operate with two layers of DNA parsing. This extension from the single layer gene prediction parse is indicated after preliminary analysis of the C. elegans alt-splice statistics. State profiles have made use of a novel hash-interpolating MM (hIMM) method. A new implementation for an HMM-with-Duration is also described, with far-reaching application to gene-structure identification and analysis of channel current blockade data. BioMed Central 2006-09-26 /pmc/articles/PMC1683574/ /pubmed/17118135 http://dx.doi.org/10.1186/1471-2105-7-S2-S14 Text en Copyright © 2006 Winters-Hilt; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Winters-Hilt, Stephen Hidden Markov Model Variants and their Application |
title | Hidden Markov Model Variants and their Application |
title_full | Hidden Markov Model Variants and their Application |
title_fullStr | Hidden Markov Model Variants and their Application |
title_full_unstemmed | Hidden Markov Model Variants and their Application |
title_short | Hidden Markov Model Variants and their Application |
title_sort | hidden markov model variants and their application |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1683574/ https://www.ncbi.nlm.nih.gov/pubmed/17118135 http://dx.doi.org/10.1186/1471-2105-7-S2-S14 |
work_keys_str_mv | AT wintershiltstephen hiddenmarkovmodelvariantsandtheirapplication |