Cargando…

MaxAlign: maximizing usable data in an alignment

BACKGROUND: The presence of gaps in an alignment of nucleotide or protein sequences is often an inconvenience for bioinformatical studies. In phylogenetic and other analyses, for instance, gapped columns are often discarded entirely from the alignment. RESULTS: MaxAlign is a program that optimizes t...

Descripción completa

Detalles Bibliográficos
Autores principales: Gouveia-Oliveira, Rodrigo, Sackett, Peter W, Pedersen, Anders G
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2000915/
https://www.ncbi.nlm.nih.gov/pubmed/17725821
http://dx.doi.org/10.1186/1471-2105-8-312
_version_ 1782135561044099072
author Gouveia-Oliveira, Rodrigo
Sackett, Peter W
Pedersen, Anders G
author_facet Gouveia-Oliveira, Rodrigo
Sackett, Peter W
Pedersen, Anders G
author_sort Gouveia-Oliveira, Rodrigo
collection PubMed
description BACKGROUND: The presence of gaps in an alignment of nucleotide or protein sequences is often an inconvenience for bioinformatical studies. In phylogenetic and other analyses, for instance, gapped columns are often discarded entirely from the alignment. RESULTS: MaxAlign is a program that optimizes the alignment prior to such analyses. Specifically, it maximizes the number of nucleotide (or amino acid) symbols that are present in gap-free columns – the alignment area – by selecting the optimal subset of sequences to exclude from the alignment. MaxAlign can be used prior to phylogenetic and bioinformatical analyses as well as in other situations where this form of alignment improvement is useful. In this work we test MaxAlign's performance in these tasks and compare the accuracy of phylogenetic estimates including and excluding gapped columns from the analysis, with and without processing with MaxAlign. In this paper we also introduce a new simple measure of tree similarity, Normalized Symmetric Similarity (NSS) that we consider useful for comparing tree topologies. CONCLUSION: We demonstrate how MaxAlign is helpful in detecting misaligned or defective sequences without requiring manual inspection. We also show that it is not advisable to exclude gapped columns from phylogenetic analyses unless MaxAlign is used first. Finally, we find that the sequences removed by MaxAlign from an alignment tend to be those that would otherwise be associated with low phylogenetic accuracy, and that the presence of gaps in any given sequence does not seem to disturb the phylogenetic estimates of other sequences. The MaxAlign web-server is freely available online at http://www.cbs.dtu.dk/services/MaxAlign where supplementary information can also be found. The program is also freely available as a Perl stand-alone package.
format Text
id pubmed-2000915
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-20009152007-10-05 MaxAlign: maximizing usable data in an alignment Gouveia-Oliveira, Rodrigo Sackett, Peter W Pedersen, Anders G BMC Bioinformatics Methodology Article BACKGROUND: The presence of gaps in an alignment of nucleotide or protein sequences is often an inconvenience for bioinformatical studies. In phylogenetic and other analyses, for instance, gapped columns are often discarded entirely from the alignment. RESULTS: MaxAlign is a program that optimizes the alignment prior to such analyses. Specifically, it maximizes the number of nucleotide (or amino acid) symbols that are present in gap-free columns – the alignment area – by selecting the optimal subset of sequences to exclude from the alignment. MaxAlign can be used prior to phylogenetic and bioinformatical analyses as well as in other situations where this form of alignment improvement is useful. In this work we test MaxAlign's performance in these tasks and compare the accuracy of phylogenetic estimates including and excluding gapped columns from the analysis, with and without processing with MaxAlign. In this paper we also introduce a new simple measure of tree similarity, Normalized Symmetric Similarity (NSS) that we consider useful for comparing tree topologies. CONCLUSION: We demonstrate how MaxAlign is helpful in detecting misaligned or defective sequences without requiring manual inspection. We also show that it is not advisable to exclude gapped columns from phylogenetic analyses unless MaxAlign is used first. Finally, we find that the sequences removed by MaxAlign from an alignment tend to be those that would otherwise be associated with low phylogenetic accuracy, and that the presence of gaps in any given sequence does not seem to disturb the phylogenetic estimates of other sequences. The MaxAlign web-server is freely available online at http://www.cbs.dtu.dk/services/MaxAlign where supplementary information can also be found. The program is also freely available as a Perl stand-alone package. BioMed Central 2007-08-28 /pmc/articles/PMC2000915/ /pubmed/17725821 http://dx.doi.org/10.1186/1471-2105-8-312 Text en Copyright © 2007 Gouveia-Oliveira et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Gouveia-Oliveira, Rodrigo
Sackett, Peter W
Pedersen, Anders G
MaxAlign: maximizing usable data in an alignment
title MaxAlign: maximizing usable data in an alignment
title_full MaxAlign: maximizing usable data in an alignment
title_fullStr MaxAlign: maximizing usable data in an alignment
title_full_unstemmed MaxAlign: maximizing usable data in an alignment
title_short MaxAlign: maximizing usable data in an alignment
title_sort maxalign: maximizing usable data in an alignment
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2000915/
https://www.ncbi.nlm.nih.gov/pubmed/17725821
http://dx.doi.org/10.1186/1471-2105-8-312
work_keys_str_mv AT gouveiaoliveirarodrigo maxalignmaximizingusabledatainanalignment
AT sackettpeterw maxalignmaximizingusabledatainanalignment
AT pedersenandersg maxalignmaximizingusabledatainanalignment