Cargando…

Multiple sequence alignment accuracy and evolutionary distance estimation

BACKGROUND: Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence alignment yields better results than pair wise sequence alignment, but this assumption has rarely been tested, and never with the control provided by simulation...

Descripción completa

Detalles Bibliográficos
Autor principal:	Rosenberg, Michael S
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2005
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1318491/ https://www.ncbi.nlm.nih.gov/pubmed/16305750 http://dx.doi.org/10.1186/1471-2105-6-278

_version_	1782126426039779328
author	Rosenberg, Michael S
author_facet	Rosenberg, Michael S
author_sort	Rosenberg, Michael S
collection	PubMed
description	BACKGROUND: Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence alignment yields better results than pair wise sequence alignment, but this assumption has rarely been tested, and never with the control provided by simulation analysis. This study used sequence simulation to examine the gain in accuracy of adding a third sequence to a pair wise alignment, particularly concentrating on how the phylogenetic position of the additional sequence relative to the first pair changes the accuracy of the initial pair's alignment as well as their estimated evolutionary distance. RESULTS: The maximal gain in alignment accuracy was found not when the third sequence is directly intermediate between the initial two sequences, but rather when it perfectly subdivides the branch leading from the root of the tree to one of the original sequences (making it half as close to one sequence as the other). Evolutionary distance estimation in the multiple alignment framework, however, is largely unrelated to alignment accuracy and rather is dependent on the position of the third sequence; the closer the branch leading to the third sequence is to the root of the tree, the larger the estimated distance between the first two sequences. CONCLUSION: The bias in distance estimation appears to be a direct result of the standard greedy progressive algorithm used by many multiple alignment methods. These results have implications for choosing new taxa and genomes to sequence when resources are limited.
format	Text
id	pubmed-1318491
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-13184912005-12-22 Multiple sequence alignment accuracy and evolutionary distance estimation Rosenberg, Michael S BMC Bioinformatics Research Article BACKGROUND: Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence alignment yields better results than pair wise sequence alignment, but this assumption has rarely been tested, and never with the control provided by simulation analysis. This study used sequence simulation to examine the gain in accuracy of adding a third sequence to a pair wise alignment, particularly concentrating on how the phylogenetic position of the additional sequence relative to the first pair changes the accuracy of the initial pair's alignment as well as their estimated evolutionary distance. RESULTS: The maximal gain in alignment accuracy was found not when the third sequence is directly intermediate between the initial two sequences, but rather when it perfectly subdivides the branch leading from the root of the tree to one of the original sequences (making it half as close to one sequence as the other). Evolutionary distance estimation in the multiple alignment framework, however, is largely unrelated to alignment accuracy and rather is dependent on the position of the third sequence; the closer the branch leading to the third sequence is to the root of the tree, the larger the estimated distance between the first two sequences. CONCLUSION: The bias in distance estimation appears to be a direct result of the standard greedy progressive algorithm used by many multiple alignment methods. These results have implications for choosing new taxa and genomes to sequence when resources are limited. BioMed Central 2005-11-23 /pmc/articles/PMC1318491/ /pubmed/16305750 http://dx.doi.org/10.1186/1471-2105-6-278 Text en Copyright © 2005 Rosenberg; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Rosenberg, Michael S Multiple sequence alignment accuracy and evolutionary distance estimation
title	Multiple sequence alignment accuracy and evolutionary distance estimation
title_full	Multiple sequence alignment accuracy and evolutionary distance estimation
title_fullStr	Multiple sequence alignment accuracy and evolutionary distance estimation
title_full_unstemmed	Multiple sequence alignment accuracy and evolutionary distance estimation
title_short	Multiple sequence alignment accuracy and evolutionary distance estimation
title_sort	multiple sequence alignment accuracy and evolutionary distance estimation
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1318491/ https://www.ncbi.nlm.nih.gov/pubmed/16305750 http://dx.doi.org/10.1186/1471-2105-6-278
work_keys_str_mv	AT rosenbergmichaels multiplesequencealignmentaccuracyandevolutionarydistanceestimation

Multiple sequence alignment accuracy and evolutionary distance estimation

Ejemplares similares