Cargando…

A linear memory algorithm for Baum-Welch training

BACKGROUND: Baum-Welch training is an expectation-maximisation algorithm for training the emission and transition probabilities of hidden Markov models in a fully automated way. It can be employed as long as a training set of annotated sequences is known, and provides a rigorous way to derive parame...

Descripción completa

Detalles Bibliográficos
Autores principales:	Miklós, István, Meyer, Irmtraud M
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2005
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1262623/ https://www.ncbi.nlm.nih.gov/pubmed/16171529 http://dx.doi.org/10.1186/1471-2105-6-231

_version_	1782125879986487296
author	Miklós, István Meyer, Irmtraud M
author_facet	Miklós, István Meyer, Irmtraud M
author_sort	Miklós, István
collection	PubMed
description	BACKGROUND: Baum-Welch training is an expectation-maximisation algorithm for training the emission and transition probabilities of hidden Markov models in a fully automated way. It can be employed as long as a training set of annotated sequences is known, and provides a rigorous way to derive parameter values which are guaranteed to be at least locally optimal. For complex hidden Markov models such as pair hidden Markov models and very long training sequences, even the most efficient algorithms for Baum-Welch training are currently too memory-consuming. This has so far effectively prevented the automatic parameter training of hidden Markov models that are currently used for biological sequence analyses. RESULTS: We introduce the first linear space algorithm for Baum-Welch training. For a hidden Markov model with M states, T free transition and E free emission parameters, and an input sequence of length L, our new algorithm requires O(M) memory and O(LMT(max )(T + E)) time for one Baum-Welch iteration, where T(max )is the maximum number of states that any state is connected to. The most memory efficient algorithm until now was the checkpointing algorithm with O(log(L)M) memory and O(log(L)LMT(max)) time requirement. Our novel algorithm thus renders the memory requirement completely independent of the length of the training sequences. More generally, for an n-hidden Markov model and n input sequences of length L, the memory requirement of O(log(L)L(n-1 )M) is reduced to O(L(n-1 )M) memory while the running time is changed from O(log(L)L(n )MT(max )+ L(n)(T + E)) to O(L(n )MT(max )(T + E)). An added advantage of our new algorithm is that a reduced time requirement can be traded for an increased memory requirement and vice versa, such that for any c ∈ {1, ..., (T + E)}, a time requirement of L(n )MT(max )c incurs a memory requirement of L(n-1 )M(T + E - c). CONCLUSION: For the large class of hidden Markov models used for example in gene prediction, whose number of states does not scale with the length of the input sequence, our novel algorithm can thus be both faster and more memory-efficient than any of the existing algorithms.
format	Text
id	pubmed-1262623
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-12626232005-10-26 A linear memory algorithm for Baum-Welch training Miklós, István Meyer, Irmtraud M BMC Bioinformatics Research Article BACKGROUND: Baum-Welch training is an expectation-maximisation algorithm for training the emission and transition probabilities of hidden Markov models in a fully automated way. It can be employed as long as a training set of annotated sequences is known, and provides a rigorous way to derive parameter values which are guaranteed to be at least locally optimal. For complex hidden Markov models such as pair hidden Markov models and very long training sequences, even the most efficient algorithms for Baum-Welch training are currently too memory-consuming. This has so far effectively prevented the automatic parameter training of hidden Markov models that are currently used for biological sequence analyses. RESULTS: We introduce the first linear space algorithm for Baum-Welch training. For a hidden Markov model with M states, T free transition and E free emission parameters, and an input sequence of length L, our new algorithm requires O(M) memory and O(LMT(max )(T + E)) time for one Baum-Welch iteration, where T(max )is the maximum number of states that any state is connected to. The most memory efficient algorithm until now was the checkpointing algorithm with O(log(L)M) memory and O(log(L)LMT(max)) time requirement. Our novel algorithm thus renders the memory requirement completely independent of the length of the training sequences. More generally, for an n-hidden Markov model and n input sequences of length L, the memory requirement of O(log(L)L(n-1 )M) is reduced to O(L(n-1 )M) memory while the running time is changed from O(log(L)L(n )MT(max )+ L(n)(T + E)) to O(L(n )MT(max )(T + E)). An added advantage of our new algorithm is that a reduced time requirement can be traded for an increased memory requirement and vice versa, such that for any c ∈ {1, ..., (T + E)}, a time requirement of L(n )MT(max )c incurs a memory requirement of L(n-1 )M(T + E - c). CONCLUSION: For the large class of hidden Markov models used for example in gene prediction, whose number of states does not scale with the length of the input sequence, our novel algorithm can thus be both faster and more memory-efficient than any of the existing algorithms. BioMed Central 2005-09-19 /pmc/articles/PMC1262623/ /pubmed/16171529 http://dx.doi.org/10.1186/1471-2105-6-231 Text en Copyright © 2005 Miklós and Meyer; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Miklós, István Meyer, Irmtraud M A linear memory algorithm for Baum-Welch training
title	A linear memory algorithm for Baum-Welch training
title_full	A linear memory algorithm for Baum-Welch training
title_fullStr	A linear memory algorithm for Baum-Welch training
title_full_unstemmed	A linear memory algorithm for Baum-Welch training
title_short	A linear memory algorithm for Baum-Welch training
title_sort	linear memory algorithm for baum-welch training
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1262623/ https://www.ncbi.nlm.nih.gov/pubmed/16171529 http://dx.doi.org/10.1186/1471-2105-6-231
work_keys_str_mv	AT miklosistvan alinearmemoryalgorithmforbaumwelchtraining AT meyerirmtraudm alinearmemoryalgorithmforbaumwelchtraining AT miklosistvan linearmemoryalgorithmforbaumwelchtraining AT meyerirmtraudm linearmemoryalgorithmforbaumwelchtraining

A linear memory algorithm for Baum-Welch training

Ejemplares similares