Cargando…

Noisy: Identification of problematic columns in multiple sequence alignments

MOTIVATION: Sequence-based methods for phylogenetic reconstruction from (nucleic acid) sequence data are notoriously plagued by two effects: homoplasies and alignment errors. Large evolutionary distances imply a large number of homoplastic sites. As most protein-coding genes show dramatic variations...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dress, Andreas WM, Flamm, Christoph, Fritzsch, Guido, Grünewald, Stefan, Kruspe, Matthias, Prohaska, Sonja J, Stadler, Peter F
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Software Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2464588/ https://www.ncbi.nlm.nih.gov/pubmed/18577231 http://dx.doi.org/10.1186/1748-7188-3-7

_version_	1782157425185390592
author	Dress, Andreas WM Flamm, Christoph Fritzsch, Guido Grünewald, Stefan Kruspe, Matthias Prohaska, Sonja J Stadler, Peter F
author_facet	Dress, Andreas WM Flamm, Christoph Fritzsch, Guido Grünewald, Stefan Kruspe, Matthias Prohaska, Sonja J Stadler, Peter F
author_sort	Dress, Andreas WM
collection	PubMed
description	MOTIVATION: Sequence-based methods for phylogenetic reconstruction from (nucleic acid) sequence data are notoriously plagued by two effects: homoplasies and alignment errors. Large evolutionary distances imply a large number of homoplastic sites. As most protein-coding genes show dramatic variations in substitution rates that are not uncorrelated across the sequence, this often leads to a patchwork pattern of (i) phylogenetically informative and (ii) effectively randomized regions. In highly variable regions, furthermore, alignment errors accumulate resulting in sometimes misleading signals in phylogenetic reconstruction. RESULTS: We present here a method that, based on assessing the distribution of character states along a cyclic ordering of the taxa, allows the identification of phylogenetically uninformative homoplastic sites in a multiple sequence alignment. Removal of these sites appears to improve the performance of phylogenetic reconstruction algorithms as measured by various indices of "tree quality". In particular, we obtain more stable trees due to the exclusion of phylogenetically incompatible sites that most likely represent strongly randomized characters. SOFTWARE: The computer program noisy implements this approach. It can be employed to improving phylogenetic reconstruction capability with quite a considerable success rate whenever (1) the average bootstrap support obtained from the original alignment is low, and (2) there are sufficiently many taxa in the data set – at least, say, 12 to 15 taxa. The software can be obtained under the GNU Public License from .
format	Text
id	pubmed-2464588
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-24645882008-07-15 Noisy: Identification of problematic columns in multiple sequence alignments Dress, Andreas WM Flamm, Christoph Fritzsch, Guido Grünewald, Stefan Kruspe, Matthias Prohaska, Sonja J Stadler, Peter F Algorithms Mol Biol Software Article MOTIVATION: Sequence-based methods for phylogenetic reconstruction from (nucleic acid) sequence data are notoriously plagued by two effects: homoplasies and alignment errors. Large evolutionary distances imply a large number of homoplastic sites. As most protein-coding genes show dramatic variations in substitution rates that are not uncorrelated across the sequence, this often leads to a patchwork pattern of (i) phylogenetically informative and (ii) effectively randomized regions. In highly variable regions, furthermore, alignment errors accumulate resulting in sometimes misleading signals in phylogenetic reconstruction. RESULTS: We present here a method that, based on assessing the distribution of character states along a cyclic ordering of the taxa, allows the identification of phylogenetically uninformative homoplastic sites in a multiple sequence alignment. Removal of these sites appears to improve the performance of phylogenetic reconstruction algorithms as measured by various indices of "tree quality". In particular, we obtain more stable trees due to the exclusion of phylogenetically incompatible sites that most likely represent strongly randomized characters. SOFTWARE: The computer program noisy implements this approach. It can be employed to improving phylogenetic reconstruction capability with quite a considerable success rate whenever (1) the average bootstrap support obtained from the original alignment is low, and (2) there are sufficiently many taxa in the data set – at least, say, 12 to 15 taxa. The software can be obtained under the GNU Public License from . BioMed Central 2008-06-24 /pmc/articles/PMC2464588/ /pubmed/18577231 http://dx.doi.org/10.1186/1748-7188-3-7 Text en Copyright © 2008 Dress et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Article Dress, Andreas WM Flamm, Christoph Fritzsch, Guido Grünewald, Stefan Kruspe, Matthias Prohaska, Sonja J Stadler, Peter F Noisy: Identification of problematic columns in multiple sequence alignments
title	Noisy: Identification of problematic columns in multiple sequence alignments
title_full	Noisy: Identification of problematic columns in multiple sequence alignments
title_fullStr	Noisy: Identification of problematic columns in multiple sequence alignments
title_full_unstemmed	Noisy: Identification of problematic columns in multiple sequence alignments
title_short	Noisy: Identification of problematic columns in multiple sequence alignments
title_sort	noisy: identification of problematic columns in multiple sequence alignments
topic	Software Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2464588/ https://www.ncbi.nlm.nih.gov/pubmed/18577231 http://dx.doi.org/10.1186/1748-7188-3-7
work_keys_str_mv	AT dressandreaswm noisyidentificationofproblematiccolumnsinmultiplesequencealignments AT flammchristoph noisyidentificationofproblematiccolumnsinmultiplesequencealignments AT fritzschguido noisyidentificationofproblematiccolumnsinmultiplesequencealignments AT grunewaldstefan noisyidentificationofproblematiccolumnsinmultiplesequencealignments AT kruspematthias noisyidentificationofproblematiccolumnsinmultiplesequencealignments AT prohaskasonjaj noisyidentificationofproblematiccolumnsinmultiplesequencealignments AT stadlerpeterf noisyidentificationofproblematiccolumnsinmultiplesequencealignments

Noisy: Identification of problematic columns in multiple sequence alignments

Ejemplares similares