Cargando…

Clustering ionic flow blockade toggles with a Mixture of HMMs

BACKGROUND: Ionic current blockade signal processing, for use in nanopore detection, offers a promising new way to analyze single molecule properties with potential implications for DNA sequencing. The α-Hemolysin transmembrane channel interacts with a translocating molecule in a nontrivial way, fre...

Descripción completa

Detalles Bibliográficos
Autores principales: Churbanov, Alexander, Winters-Hilt, Stephen
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2537564/
https://www.ncbi.nlm.nih.gov/pubmed/18793458
http://dx.doi.org/10.1186/1471-2105-9-S9-S13
_version_ 1782159107759800320
author Churbanov, Alexander
Winters-Hilt, Stephen
author_facet Churbanov, Alexander
Winters-Hilt, Stephen
author_sort Churbanov, Alexander
collection PubMed
description BACKGROUND: Ionic current blockade signal processing, for use in nanopore detection, offers a promising new way to analyze single molecule properties with potential implications for DNA sequencing. The α-Hemolysin transmembrane channel interacts with a translocating molecule in a nontrivial way, frequently evidenced by a complex ionic flow blockade pattern with readily distinguishable modes of toggling. Effective processing of such signals requires developing machine learning methods capable of learning the various blockade modes for classification and knowledge discovery purposes. Here we propose a method aimed to improve our stochastic analysis capabilities to better understand the discriminatory capabilities of the observed the nanopore channel interactions with analyte. RESULTS: We tailored our memory-sparse distributed implementation of a Mixture of Hidden Markov Models (MHMMs) to the problem of channel current blockade clustering and associated analyte classification. By using probabilistic fully connected HMM profiles as mixture components we were able to cluster the various 9 base-pair hairpin channel blockades. We obtained very high Maximum a Posteriori (MAP) classification with a mixture of 12 different channel blockade profiles, each with 4 levels, a configuration that can be computed with sufficient speed for real-time experimental feedback. MAP classification performance depends on several factors such as the number of mixture components, the number of levels in each profile, and the duration of a channel blockade event. We distribute Baum-Welch Expectation Maximization (EM) algorithms running on our model in two ways. A distributed implementation of the MHMM data processing accelerates data clustering efforts. The second, simultanteous, strategy uses an EM checkpointing algorithm to lower the memory use and efficiently distribute the bulk of EM processing in processing large data sequences (such as for the progressive sums used in the HMM parameter estimates). CONCLUSION: The proposed distributed MHMM method has many appealing properties, such as precise classification of analyte in real-time scenarios, and the ability to incorporate new domain knowledge into a flexible, easily distributable, architecture. The distributed HMM provides a feature extraction that is equivalent to that of the sequential HMM with a speedup factor approximately equal to the number of independent CPUs operating on the data. The MHMM topology learns clusters existing within data samples via distributed HMM EM learning. A Java implementation of the MHMM algorithm is available at .
format Text
id pubmed-2537564
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25375642008-09-17 Clustering ionic flow blockade toggles with a Mixture of HMMs Churbanov, Alexander Winters-Hilt, Stephen BMC Bioinformatics Proceedings BACKGROUND: Ionic current blockade signal processing, for use in nanopore detection, offers a promising new way to analyze single molecule properties with potential implications for DNA sequencing. The α-Hemolysin transmembrane channel interacts with a translocating molecule in a nontrivial way, frequently evidenced by a complex ionic flow blockade pattern with readily distinguishable modes of toggling. Effective processing of such signals requires developing machine learning methods capable of learning the various blockade modes for classification and knowledge discovery purposes. Here we propose a method aimed to improve our stochastic analysis capabilities to better understand the discriminatory capabilities of the observed the nanopore channel interactions with analyte. RESULTS: We tailored our memory-sparse distributed implementation of a Mixture of Hidden Markov Models (MHMMs) to the problem of channel current blockade clustering and associated analyte classification. By using probabilistic fully connected HMM profiles as mixture components we were able to cluster the various 9 base-pair hairpin channel blockades. We obtained very high Maximum a Posteriori (MAP) classification with a mixture of 12 different channel blockade profiles, each with 4 levels, a configuration that can be computed with sufficient speed for real-time experimental feedback. MAP classification performance depends on several factors such as the number of mixture components, the number of levels in each profile, and the duration of a channel blockade event. We distribute Baum-Welch Expectation Maximization (EM) algorithms running on our model in two ways. A distributed implementation of the MHMM data processing accelerates data clustering efforts. The second, simultanteous, strategy uses an EM checkpointing algorithm to lower the memory use and efficiently distribute the bulk of EM processing in processing large data sequences (such as for the progressive sums used in the HMM parameter estimates). CONCLUSION: The proposed distributed MHMM method has many appealing properties, such as precise classification of analyte in real-time scenarios, and the ability to incorporate new domain knowledge into a flexible, easily distributable, architecture. The distributed HMM provides a feature extraction that is equivalent to that of the sequential HMM with a speedup factor approximately equal to the number of independent CPUs operating on the data. The MHMM topology learns clusters existing within data samples via distributed HMM EM learning. A Java implementation of the MHMM algorithm is available at . BioMed Central 2008-08-12 /pmc/articles/PMC2537564/ /pubmed/18793458 http://dx.doi.org/10.1186/1471-2105-9-S9-S13 Text en Copyright © 2008 Churbanov and Winters-Hilt; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Churbanov, Alexander
Winters-Hilt, Stephen
Clustering ionic flow blockade toggles with a Mixture of HMMs
title Clustering ionic flow blockade toggles with a Mixture of HMMs
title_full Clustering ionic flow blockade toggles with a Mixture of HMMs
title_fullStr Clustering ionic flow blockade toggles with a Mixture of HMMs
title_full_unstemmed Clustering ionic flow blockade toggles with a Mixture of HMMs
title_short Clustering ionic flow blockade toggles with a Mixture of HMMs
title_sort clustering ionic flow blockade toggles with a mixture of hmms
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2537564/
https://www.ncbi.nlm.nih.gov/pubmed/18793458
http://dx.doi.org/10.1186/1471-2105-9-S9-S13
work_keys_str_mv AT churbanovalexander clusteringionicflowblockadetoggleswithamixtureofhmms
AT wintershiltstephen clusteringionicflowblockadetoggleswithamixtureofhmms