Cargando…
MaxAlign: maximizing usable data in an alignment
BACKGROUND: The presence of gaps in an alignment of nucleotide or protein sequences is often an inconvenience for bioinformatical studies. In phylogenetic and other analyses, for instance, gapped columns are often discarded entirely from the alignment. RESULTS: MaxAlign is a program that optimizes t...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2000915/ https://www.ncbi.nlm.nih.gov/pubmed/17725821 http://dx.doi.org/10.1186/1471-2105-8-312 |
_version_ | 1782135561044099072 |
---|---|
author | Gouveia-Oliveira, Rodrigo Sackett, Peter W Pedersen, Anders G |
author_facet | Gouveia-Oliveira, Rodrigo Sackett, Peter W Pedersen, Anders G |
author_sort | Gouveia-Oliveira, Rodrigo |
collection | PubMed |
description | BACKGROUND: The presence of gaps in an alignment of nucleotide or protein sequences is often an inconvenience for bioinformatical studies. In phylogenetic and other analyses, for instance, gapped columns are often discarded entirely from the alignment. RESULTS: MaxAlign is a program that optimizes the alignment prior to such analyses. Specifically, it maximizes the number of nucleotide (or amino acid) symbols that are present in gap-free columns – the alignment area – by selecting the optimal subset of sequences to exclude from the alignment. MaxAlign can be used prior to phylogenetic and bioinformatical analyses as well as in other situations where this form of alignment improvement is useful. In this work we test MaxAlign's performance in these tasks and compare the accuracy of phylogenetic estimates including and excluding gapped columns from the analysis, with and without processing with MaxAlign. In this paper we also introduce a new simple measure of tree similarity, Normalized Symmetric Similarity (NSS) that we consider useful for comparing tree topologies. CONCLUSION: We demonstrate how MaxAlign is helpful in detecting misaligned or defective sequences without requiring manual inspection. We also show that it is not advisable to exclude gapped columns from phylogenetic analyses unless MaxAlign is used first. Finally, we find that the sequences removed by MaxAlign from an alignment tend to be those that would otherwise be associated with low phylogenetic accuracy, and that the presence of gaps in any given sequence does not seem to disturb the phylogenetic estimates of other sequences. The MaxAlign web-server is freely available online at http://www.cbs.dtu.dk/services/MaxAlign where supplementary information can also be found. The program is also freely available as a Perl stand-alone package. |
format | Text |
id | pubmed-2000915 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-20009152007-10-05 MaxAlign: maximizing usable data in an alignment Gouveia-Oliveira, Rodrigo Sackett, Peter W Pedersen, Anders G BMC Bioinformatics Methodology Article BACKGROUND: The presence of gaps in an alignment of nucleotide or protein sequences is often an inconvenience for bioinformatical studies. In phylogenetic and other analyses, for instance, gapped columns are often discarded entirely from the alignment. RESULTS: MaxAlign is a program that optimizes the alignment prior to such analyses. Specifically, it maximizes the number of nucleotide (or amino acid) symbols that are present in gap-free columns – the alignment area – by selecting the optimal subset of sequences to exclude from the alignment. MaxAlign can be used prior to phylogenetic and bioinformatical analyses as well as in other situations where this form of alignment improvement is useful. In this work we test MaxAlign's performance in these tasks and compare the accuracy of phylogenetic estimates including and excluding gapped columns from the analysis, with and without processing with MaxAlign. In this paper we also introduce a new simple measure of tree similarity, Normalized Symmetric Similarity (NSS) that we consider useful for comparing tree topologies. CONCLUSION: We demonstrate how MaxAlign is helpful in detecting misaligned or defective sequences without requiring manual inspection. We also show that it is not advisable to exclude gapped columns from phylogenetic analyses unless MaxAlign is used first. Finally, we find that the sequences removed by MaxAlign from an alignment tend to be those that would otherwise be associated with low phylogenetic accuracy, and that the presence of gaps in any given sequence does not seem to disturb the phylogenetic estimates of other sequences. The MaxAlign web-server is freely available online at http://www.cbs.dtu.dk/services/MaxAlign where supplementary information can also be found. The program is also freely available as a Perl stand-alone package. BioMed Central 2007-08-28 /pmc/articles/PMC2000915/ /pubmed/17725821 http://dx.doi.org/10.1186/1471-2105-8-312 Text en Copyright © 2007 Gouveia-Oliveira et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Gouveia-Oliveira, Rodrigo Sackett, Peter W Pedersen, Anders G MaxAlign: maximizing usable data in an alignment |
title | MaxAlign: maximizing usable data in an alignment |
title_full | MaxAlign: maximizing usable data in an alignment |
title_fullStr | MaxAlign: maximizing usable data in an alignment |
title_full_unstemmed | MaxAlign: maximizing usable data in an alignment |
title_short | MaxAlign: maximizing usable data in an alignment |
title_sort | maxalign: maximizing usable data in an alignment |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2000915/ https://www.ncbi.nlm.nih.gov/pubmed/17725821 http://dx.doi.org/10.1186/1471-2105-8-312 |
work_keys_str_mv | AT gouveiaoliveirarodrigo maxalignmaximizingusabledatainanalignment AT sackettpeterw maxalignmaximizingusabledatainanalignment AT pedersenandersg maxalignmaximizingusabledatainanalignment |