Cargando…
Comprehensive comparison of graph based multiple protein sequence alignment strategies
BACKGROUND: Alignment of protein sequences (MPSA) is the starting point for a multitude of applications in molecular biology. Here, we present a novel MPSA program based on the SeqAn sequence alignment library. Our implementation has a strict modular structure, which allows to swap different compone...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3375188/ https://www.ncbi.nlm.nih.gov/pubmed/22540977 http://dx.doi.org/10.1186/1471-2105-13-64 |
_version_ | 1782235725658324992 |
---|---|
author | Plyusnin, Ilya Holm, Liisa |
author_facet | Plyusnin, Ilya Holm, Liisa |
author_sort | Plyusnin, Ilya |
collection | PubMed |
description | BACKGROUND: Alignment of protein sequences (MPSA) is the starting point for a multitude of applications in molecular biology. Here, we present a novel MPSA program based on the SeqAn sequence alignment library. Our implementation has a strict modular structure, which allows to swap different components of the alignment process and, thus, to investigate their contribution to the alignment quality and computation time. We systematically varied information sources, guiding trees, score transformations and iterative refinement options, and evaluated the resulting alignments on BAliBASE and SABmark. RESULTS: Our results indicate the optimal alignment strategy based on the choices compared. First, we show that pairwise global and local alignments contain sufficient information to construct a high quality multiple alignment. Second, single linkage clustering is almost invariably the best algorithm to build a guiding tree for progressive alignment. Third, triplet library extension, with introduction of new edges, is the most efficient consistency transformation of those compared. Alternatively, one can apply tree dependent partitioning as a post processing step, which was shown to be comparable with the best consistency transformation in both time and accuracy. Finally, propagating information beyond four transitive links introduces more noise than signal. CONCLUSIONS: This is the first time multiple protein alignment strategies are comprehensively and clearly compared using a single implementation platform. In particular, we showed which of the existing consistency transformations and iterative refinement techniques are the most valid. Our implementation is freely available at http://ekhidna.biocenter.helsinki.fi/MMSA and as a supplementary file attached to this article (see Additional file 1). |
format | Online Article Text |
id | pubmed-3375188 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-33751882012-06-15 Comprehensive comparison of graph based multiple protein sequence alignment strategies Plyusnin, Ilya Holm, Liisa BMC Bioinformatics Research Article BACKGROUND: Alignment of protein sequences (MPSA) is the starting point for a multitude of applications in molecular biology. Here, we present a novel MPSA program based on the SeqAn sequence alignment library. Our implementation has a strict modular structure, which allows to swap different components of the alignment process and, thus, to investigate their contribution to the alignment quality and computation time. We systematically varied information sources, guiding trees, score transformations and iterative refinement options, and evaluated the resulting alignments on BAliBASE and SABmark. RESULTS: Our results indicate the optimal alignment strategy based on the choices compared. First, we show that pairwise global and local alignments contain sufficient information to construct a high quality multiple alignment. Second, single linkage clustering is almost invariably the best algorithm to build a guiding tree for progressive alignment. Third, triplet library extension, with introduction of new edges, is the most efficient consistency transformation of those compared. Alternatively, one can apply tree dependent partitioning as a post processing step, which was shown to be comparable with the best consistency transformation in both time and accuracy. Finally, propagating information beyond four transitive links introduces more noise than signal. CONCLUSIONS: This is the first time multiple protein alignment strategies are comprehensively and clearly compared using a single implementation platform. In particular, we showed which of the existing consistency transformations and iterative refinement techniques are the most valid. Our implementation is freely available at http://ekhidna.biocenter.helsinki.fi/MMSA and as a supplementary file attached to this article (see Additional file 1). BioMed Central 2012-04-29 /pmc/articles/PMC3375188/ /pubmed/22540977 http://dx.doi.org/10.1186/1471-2105-13-64 Text en Copyright ©2012 Plyusnin and Holm; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Plyusnin, Ilya Holm, Liisa Comprehensive comparison of graph based multiple protein sequence alignment strategies |
title | Comprehensive comparison of graph based multiple protein sequence alignment strategies |
title_full | Comprehensive comparison of graph based multiple protein sequence alignment strategies |
title_fullStr | Comprehensive comparison of graph based multiple protein sequence alignment strategies |
title_full_unstemmed | Comprehensive comparison of graph based multiple protein sequence alignment strategies |
title_short | Comprehensive comparison of graph based multiple protein sequence alignment strategies |
title_sort | comprehensive comparison of graph based multiple protein sequence alignment strategies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3375188/ https://www.ncbi.nlm.nih.gov/pubmed/22540977 http://dx.doi.org/10.1186/1471-2105-13-64 |
work_keys_str_mv | AT plyusninilya comprehensivecomparisonofgraphbasedmultipleproteinsequencealignmentstrategies AT holmliisa comprehensivecomparisonofgraphbasedmultipleproteinsequencealignmentstrategies |