Cargando…

Comprehensive comparison of graph based multiple protein sequence alignment strategies

BACKGROUND: Alignment of protein sequences (MPSA) is the starting point for a multitude of applications in molecular biology. Here, we present a novel MPSA program based on the SeqAn sequence alignment library. Our implementation has a strict modular structure, which allows to swap different compone...

Descripción completa

Detalles Bibliográficos
Autores principales: Plyusnin, Ilya, Holm, Liisa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3375188/
https://www.ncbi.nlm.nih.gov/pubmed/22540977
http://dx.doi.org/10.1186/1471-2105-13-64
_version_ 1782235725658324992
author Plyusnin, Ilya
Holm, Liisa
author_facet Plyusnin, Ilya
Holm, Liisa
author_sort Plyusnin, Ilya
collection PubMed
description BACKGROUND: Alignment of protein sequences (MPSA) is the starting point for a multitude of applications in molecular biology. Here, we present a novel MPSA program based on the SeqAn sequence alignment library. Our implementation has a strict modular structure, which allows to swap different components of the alignment process and, thus, to investigate their contribution to the alignment quality and computation time. We systematically varied information sources, guiding trees, score transformations and iterative refinement options, and evaluated the resulting alignments on BAliBASE and SABmark. RESULTS: Our results indicate the optimal alignment strategy based on the choices compared. First, we show that pairwise global and local alignments contain sufficient information to construct a high quality multiple alignment. Second, single linkage clustering is almost invariably the best algorithm to build a guiding tree for progressive alignment. Third, triplet library extension, with introduction of new edges, is the most efficient consistency transformation of those compared. Alternatively, one can apply tree dependent partitioning as a post processing step, which was shown to be comparable with the best consistency transformation in both time and accuracy. Finally, propagating information beyond four transitive links introduces more noise than signal. CONCLUSIONS: This is the first time multiple protein alignment strategies are comprehensively and clearly compared using a single implementation platform. In particular, we showed which of the existing consistency transformations and iterative refinement techniques are the most valid. Our implementation is freely available at http://ekhidna.biocenter.helsinki.fi/MMSA and as a supplementary file attached to this article (see Additional file 1).
format Online
Article
Text
id pubmed-3375188
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33751882012-06-15 Comprehensive comparison of graph based multiple protein sequence alignment strategies Plyusnin, Ilya Holm, Liisa BMC Bioinformatics Research Article BACKGROUND: Alignment of protein sequences (MPSA) is the starting point for a multitude of applications in molecular biology. Here, we present a novel MPSA program based on the SeqAn sequence alignment library. Our implementation has a strict modular structure, which allows to swap different components of the alignment process and, thus, to investigate their contribution to the alignment quality and computation time. We systematically varied information sources, guiding trees, score transformations and iterative refinement options, and evaluated the resulting alignments on BAliBASE and SABmark. RESULTS: Our results indicate the optimal alignment strategy based on the choices compared. First, we show that pairwise global and local alignments contain sufficient information to construct a high quality multiple alignment. Second, single linkage clustering is almost invariably the best algorithm to build a guiding tree for progressive alignment. Third, triplet library extension, with introduction of new edges, is the most efficient consistency transformation of those compared. Alternatively, one can apply tree dependent partitioning as a post processing step, which was shown to be comparable with the best consistency transformation in both time and accuracy. Finally, propagating information beyond four transitive links introduces more noise than signal. CONCLUSIONS: This is the first time multiple protein alignment strategies are comprehensively and clearly compared using a single implementation platform. In particular, we showed which of the existing consistency transformations and iterative refinement techniques are the most valid. Our implementation is freely available at http://ekhidna.biocenter.helsinki.fi/MMSA and as a supplementary file attached to this article (see Additional file 1). BioMed Central 2012-04-29 /pmc/articles/PMC3375188/ /pubmed/22540977 http://dx.doi.org/10.1186/1471-2105-13-64 Text en Copyright ©2012 Plyusnin and Holm; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Plyusnin, Ilya
Holm, Liisa
Comprehensive comparison of graph based multiple protein sequence alignment strategies
title Comprehensive comparison of graph based multiple protein sequence alignment strategies
title_full Comprehensive comparison of graph based multiple protein sequence alignment strategies
title_fullStr Comprehensive comparison of graph based multiple protein sequence alignment strategies
title_full_unstemmed Comprehensive comparison of graph based multiple protein sequence alignment strategies
title_short Comprehensive comparison of graph based multiple protein sequence alignment strategies
title_sort comprehensive comparison of graph based multiple protein sequence alignment strategies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3375188/
https://www.ncbi.nlm.nih.gov/pubmed/22540977
http://dx.doi.org/10.1186/1471-2105-13-64
work_keys_str_mv AT plyusninilya comprehensivecomparisonofgraphbasedmultipleproteinsequencealignmentstrategies
AT holmliisa comprehensivecomparisonofgraphbasedmultipleproteinsequencealignmentstrategies