Cargando…

zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm

BACKGROUND: Hidden Markov models are widely used for genome analysis as they combine ease of modelling with efficient analysis algorithms. Calculating the likelihood of a model using the forward algorithm has worst case time complexity linear in the length of the sequence and quadratic in the number...

Descripción completa

Detalles Bibliográficos
Autores principales: Sand, Andreas, Kristiansen, Martin, Pedersen, Christian NS, Mailund, Thomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4222747/
https://www.ncbi.nlm.nih.gov/pubmed/24266924
http://dx.doi.org/10.1186/1471-2105-14-339
_version_ 1782343095019372544
author Sand, Andreas
Kristiansen, Martin
Pedersen, Christian NS
Mailund, Thomas
author_facet Sand, Andreas
Kristiansen, Martin
Pedersen, Christian NS
Mailund, Thomas
author_sort Sand, Andreas
collection PubMed
description BACKGROUND: Hidden Markov models are widely used for genome analysis as they combine ease of modelling with efficient analysis algorithms. Calculating the likelihood of a model using the forward algorithm has worst case time complexity linear in the length of the sequence and quadratic in the number of states in the model. For genome analysis, however, the length runs to millions or billions of observations, and when maximising the likelihood hundreds of evaluations are often needed. A time efficient forward algorithm is therefore a key ingredient in an efficient hidden Markov model library. RESULTS: We have built a software library for efficiently computing the likelihood of a hidden Markov model. The library exploits commonly occurring substrings in the input to reuse computations in the forward algorithm. In a pre-processing step our library identifies common substrings and builds a structure over the computations in the forward algorithm which can be reused. This analysis can be saved between uses of the library and is independent of concrete hidden Markov models so one preprocessing can be used to run a number of different models. Using this library, we achieve up to 78 times shorter wall-clock time for realistic whole-genome analyses with a real and reasonably complex hidden Markov model. In one particular case the analysis was performed in less than 8 minutes compared to 9.6 hours for the previously fastest library. CONCLUSIONS: We have implemented the preprocessing procedure and forward algorithm as a C++ library, zipHMM, with Python bindings for use in scripts. The library is available at http://birc.au.dk/software/ziphmm/.
format Online
Article
Text
id pubmed-4222747
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42227472014-11-10 zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm Sand, Andreas Kristiansen, Martin Pedersen, Christian NS Mailund, Thomas BMC Bioinformatics Software BACKGROUND: Hidden Markov models are widely used for genome analysis as they combine ease of modelling with efficient analysis algorithms. Calculating the likelihood of a model using the forward algorithm has worst case time complexity linear in the length of the sequence and quadratic in the number of states in the model. For genome analysis, however, the length runs to millions or billions of observations, and when maximising the likelihood hundreds of evaluations are often needed. A time efficient forward algorithm is therefore a key ingredient in an efficient hidden Markov model library. RESULTS: We have built a software library for efficiently computing the likelihood of a hidden Markov model. The library exploits commonly occurring substrings in the input to reuse computations in the forward algorithm. In a pre-processing step our library identifies common substrings and builds a structure over the computations in the forward algorithm which can be reused. This analysis can be saved between uses of the library and is independent of concrete hidden Markov models so one preprocessing can be used to run a number of different models. Using this library, we achieve up to 78 times shorter wall-clock time for realistic whole-genome analyses with a real and reasonably complex hidden Markov model. In one particular case the analysis was performed in less than 8 minutes compared to 9.6 hours for the previously fastest library. CONCLUSIONS: We have implemented the preprocessing procedure and forward algorithm as a C++ library, zipHMM, with Python bindings for use in scripts. The library is available at http://birc.au.dk/software/ziphmm/. BioMed Central 2013-11-22 /pmc/articles/PMC4222747/ /pubmed/24266924 http://dx.doi.org/10.1186/1471-2105-14-339 Text en Copyright © 2013 Sand et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Sand, Andreas
Kristiansen, Martin
Pedersen, Christian NS
Mailund, Thomas
zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm
title zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm
title_full zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm
title_fullStr zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm
title_full_unstemmed zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm
title_short zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm
title_sort ziphmmlib: a highly optimised hmm library exploiting repetitions in the input to speed up the forward algorithm
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4222747/
https://www.ncbi.nlm.nih.gov/pubmed/24266924
http://dx.doi.org/10.1186/1471-2105-14-339
work_keys_str_mv AT sandandreas ziphmmlibahighlyoptimisedhmmlibraryexploitingrepetitionsintheinputtospeeduptheforwardalgorithm
AT kristiansenmartin ziphmmlibahighlyoptimisedhmmlibraryexploitingrepetitionsintheinputtospeeduptheforwardalgorithm
AT pedersenchristianns ziphmmlibahighlyoptimisedhmmlibraryexploitingrepetitionsintheinputtospeeduptheforwardalgorithm
AT mailundthomas ziphmmlibahighlyoptimisedhmmlibraryexploitingrepetitionsintheinputtospeeduptheforwardalgorithm