Cargando…

Sequence alignment by passing messages

BACKGROUND: Sequence alignment has become an indispensable tool in modern molecular biology research, and probabilistic sequence alignment models have been shown to provide an effective framework for building accurate sequence alignment tools. One such example is the pair hidden Markov model (pair-H...

Descripción completa

Detalles Bibliográficos
Autor principal: Yoon, Byung-Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046711/
https://www.ncbi.nlm.nih.gov/pubmed/24564436
http://dx.doi.org/10.1186/1471-2164-15-S1-S14
_version_ 1782480302280540160
author Yoon, Byung-Jun
author_facet Yoon, Byung-Jun
author_sort Yoon, Byung-Jun
collection PubMed
description BACKGROUND: Sequence alignment has become an indispensable tool in modern molecular biology research, and probabilistic sequence alignment models have been shown to provide an effective framework for building accurate sequence alignment tools. One such example is the pair hidden Markov model (pair-HMM), which has been especially popular in comparative sequence analysis for several reasons, including their effectiveness in modeling and detecting sequence homology, model simplicity, and the existence of efficient algorithms for applying the model to sequence alignment problems. However, despite these advantages, pair-HMMs also have a number of practical limitations that may degrade their alignment performance or render them unsuitable for certain alignment tasks. RESULTS: In this work, we propose a novel scheme for comparing and aligning biological sequences that can effectively address the shortcomings of the traditional pair-HMMs. The proposed scheme is based on a simple message-passing approach, where messages are exchanged between neighboring symbol pairs that may be potentially aligned in the optimal sequence alignment. The message-passing process yields probabilistic symbol alignment confidence scores, which may be used for predicting the optimal alignment that maximizes the expected number of correctly aligned symbol pairs. CONCLUSIONS: Extensive performance evaluation on protein alignment benchmark datasets shows that the proposed message-passing scheme clearly outperforms the traditional pair-HMM-based approach, in terms of both alignment accuracy and computational efficiency. Furthermore, the proposed scheme is numerically robust and amenable to massive parallelization.
format Online
Article
Text
id pubmed-4046711
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40467112014-06-06 Sequence alignment by passing messages Yoon, Byung-Jun BMC Genomics Proceedings BACKGROUND: Sequence alignment has become an indispensable tool in modern molecular biology research, and probabilistic sequence alignment models have been shown to provide an effective framework for building accurate sequence alignment tools. One such example is the pair hidden Markov model (pair-HMM), which has been especially popular in comparative sequence analysis for several reasons, including their effectiveness in modeling and detecting sequence homology, model simplicity, and the existence of efficient algorithms for applying the model to sequence alignment problems. However, despite these advantages, pair-HMMs also have a number of practical limitations that may degrade their alignment performance or render them unsuitable for certain alignment tasks. RESULTS: In this work, we propose a novel scheme for comparing and aligning biological sequences that can effectively address the shortcomings of the traditional pair-HMMs. The proposed scheme is based on a simple message-passing approach, where messages are exchanged between neighboring symbol pairs that may be potentially aligned in the optimal sequence alignment. The message-passing process yields probabilistic symbol alignment confidence scores, which may be used for predicting the optimal alignment that maximizes the expected number of correctly aligned symbol pairs. CONCLUSIONS: Extensive performance evaluation on protein alignment benchmark datasets shows that the proposed message-passing scheme clearly outperforms the traditional pair-HMM-based approach, in terms of both alignment accuracy and computational efficiency. Furthermore, the proposed scheme is numerically robust and amenable to massive parallelization. BioMed Central 2014-01-24 /pmc/articles/PMC4046711/ /pubmed/24564436 http://dx.doi.org/10.1186/1471-2164-15-S1-S14 Text en © Yoon; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Yoon, Byung-Jun
Sequence alignment by passing messages
title Sequence alignment by passing messages
title_full Sequence alignment by passing messages
title_fullStr Sequence alignment by passing messages
title_full_unstemmed Sequence alignment by passing messages
title_short Sequence alignment by passing messages
title_sort sequence alignment by passing messages
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046711/
https://www.ncbi.nlm.nih.gov/pubmed/24564436
http://dx.doi.org/10.1186/1471-2164-15-S1-S14
work_keys_str_mv AT yoonbyungjun sequencealignmentbypassingmessages