Cargando…

DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment

BACKGROUND: We present a complete re-implementation of the segment-based approach to multiple protein alignment that contains a number of improvements compared to the previous version 2.2 of DIALIGN. This previous version is superior to Needleman-Wunsch-based multi-alignment programs on locally rela...

Descripción completa

Detalles Bibliográficos
Autores principales: Subramanian, Amarendran R, Weyer-Menkhoff, Jan, Kaufmann, Michael, Morgenstern, Burkhard
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1087830/
https://www.ncbi.nlm.nih.gov/pubmed/15784139
http://dx.doi.org/10.1186/1471-2105-6-66
_version_ 1782123824576200704
author Subramanian, Amarendran R
Weyer-Menkhoff, Jan
Kaufmann, Michael
Morgenstern, Burkhard
author_facet Subramanian, Amarendran R
Weyer-Menkhoff, Jan
Kaufmann, Michael
Morgenstern, Burkhard
author_sort Subramanian, Amarendran R
collection PubMed
description BACKGROUND: We present a complete re-implementation of the segment-based approach to multiple protein alignment that contains a number of improvements compared to the previous version 2.2 of DIALIGN. This previous version is superior to Needleman-Wunsch-based multi-alignment programs on locally related sequence sets. However, it is often outperformed by these methods on data sets with global but weak similarity at the primary-sequence level. RESULTS: In the present paper, we discuss strengths and weaknesses of DIALIGN in view of the underlying objective function. Based on these results, we propose several heuristics to improve the segment-based alignment approach. For pairwise alignment, we implemented a fragment-chaining algorithm that favours chains of low-scoring local alignments over isolated high-scoring fragments. For multiple alignment, we use an improved greedy procedure that is less sensitive to spurious local sequence similarities. To evaluate our method on globally related protein families, we used the well-known database BAliBASE. For benchmarking tests on locally related sequences, we created a new reference database called IRMBASE which consists of simulated conserved motifs implanted into non-related random sequences. CONCLUSION: On BAliBASE, our new program performs significantly better than the previous version of DIALIGN and is comparable to the standard global aligner CLUSTAL W, though it is outperformed by some newly developed programs that focus on global alignment. On the locally related test sets in IRMBASE, our method outperforms all other programs that we evaluated.
format Text
id pubmed-1087830
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-10878302005-04-30 DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment Subramanian, Amarendran R Weyer-Menkhoff, Jan Kaufmann, Michael Morgenstern, Burkhard BMC Bioinformatics Research Article BACKGROUND: We present a complete re-implementation of the segment-based approach to multiple protein alignment that contains a number of improvements compared to the previous version 2.2 of DIALIGN. This previous version is superior to Needleman-Wunsch-based multi-alignment programs on locally related sequence sets. However, it is often outperformed by these methods on data sets with global but weak similarity at the primary-sequence level. RESULTS: In the present paper, we discuss strengths and weaknesses of DIALIGN in view of the underlying objective function. Based on these results, we propose several heuristics to improve the segment-based alignment approach. For pairwise alignment, we implemented a fragment-chaining algorithm that favours chains of low-scoring local alignments over isolated high-scoring fragments. For multiple alignment, we use an improved greedy procedure that is less sensitive to spurious local sequence similarities. To evaluate our method on globally related protein families, we used the well-known database BAliBASE. For benchmarking tests on locally related sequences, we created a new reference database called IRMBASE which consists of simulated conserved motifs implanted into non-related random sequences. CONCLUSION: On BAliBASE, our new program performs significantly better than the previous version of DIALIGN and is comparable to the standard global aligner CLUSTAL W, though it is outperformed by some newly developed programs that focus on global alignment. On the locally related test sets in IRMBASE, our method outperforms all other programs that we evaluated. BioMed Central 2005-03-22 /pmc/articles/PMC1087830/ /pubmed/15784139 http://dx.doi.org/10.1186/1471-2105-6-66 Text en Copyright © 2005 Subramanian et al; licensee BioMed Central Ltd.
spellingShingle Research Article
Subramanian, Amarendran R
Weyer-Menkhoff, Jan
Kaufmann, Michael
Morgenstern, Burkhard
DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment
title DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment
title_full DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment
title_fullStr DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment
title_full_unstemmed DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment
title_short DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment
title_sort dialign-t: an improved algorithm for segment-based multiple sequence alignment
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1087830/
https://www.ncbi.nlm.nih.gov/pubmed/15784139
http://dx.doi.org/10.1186/1471-2105-6-66
work_keys_str_mv AT subramanianamarendranr dialigntanimprovedalgorithmforsegmentbasedmultiplesequencealignment
AT weyermenkhoffjan dialigntanimprovedalgorithmforsegmentbasedmultiplesequencealignment
AT kaufmannmichael dialigntanimprovedalgorithmforsegmentbasedmultiplesequencealignment
AT morgensternburkhard dialigntanimprovedalgorithmforsegmentbasedmultiplesequencealignment