Cargando…

Multiple sequence alignment accuracy and evolutionary distance estimation

BACKGROUND: Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence alignment yields better results than pair wise sequence alignment, but this assumption has rarely been tested, and never with the control provided by simulation...

Descripción completa

Detalles Bibliográficos
Autor principal: Rosenberg, Michael S
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1318491/
https://www.ncbi.nlm.nih.gov/pubmed/16305750
http://dx.doi.org/10.1186/1471-2105-6-278
_version_ 1782126426039779328
author Rosenberg, Michael S
author_facet Rosenberg, Michael S
author_sort Rosenberg, Michael S
collection PubMed
description BACKGROUND: Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence alignment yields better results than pair wise sequence alignment, but this assumption has rarely been tested, and never with the control provided by simulation analysis. This study used sequence simulation to examine the gain in accuracy of adding a third sequence to a pair wise alignment, particularly concentrating on how the phylogenetic position of the additional sequence relative to the first pair changes the accuracy of the initial pair's alignment as well as their estimated evolutionary distance. RESULTS: The maximal gain in alignment accuracy was found not when the third sequence is directly intermediate between the initial two sequences, but rather when it perfectly subdivides the branch leading from the root of the tree to one of the original sequences (making it half as close to one sequence as the other). Evolutionary distance estimation in the multiple alignment framework, however, is largely unrelated to alignment accuracy and rather is dependent on the position of the third sequence; the closer the branch leading to the third sequence is to the root of the tree, the larger the estimated distance between the first two sequences. CONCLUSION: The bias in distance estimation appears to be a direct result of the standard greedy progressive algorithm used by many multiple alignment methods. These results have implications for choosing new taxa and genomes to sequence when resources are limited.
format Text
id pubmed-1318491
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-13184912005-12-22 Multiple sequence alignment accuracy and evolutionary distance estimation Rosenberg, Michael S BMC Bioinformatics Research Article BACKGROUND: Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence alignment yields better results than pair wise sequence alignment, but this assumption has rarely been tested, and never with the control provided by simulation analysis. This study used sequence simulation to examine the gain in accuracy of adding a third sequence to a pair wise alignment, particularly concentrating on how the phylogenetic position of the additional sequence relative to the first pair changes the accuracy of the initial pair's alignment as well as their estimated evolutionary distance. RESULTS: The maximal gain in alignment accuracy was found not when the third sequence is directly intermediate between the initial two sequences, but rather when it perfectly subdivides the branch leading from the root of the tree to one of the original sequences (making it half as close to one sequence as the other). Evolutionary distance estimation in the multiple alignment framework, however, is largely unrelated to alignment accuracy and rather is dependent on the position of the third sequence; the closer the branch leading to the third sequence is to the root of the tree, the larger the estimated distance between the first two sequences. CONCLUSION: The bias in distance estimation appears to be a direct result of the standard greedy progressive algorithm used by many multiple alignment methods. These results have implications for choosing new taxa and genomes to sequence when resources are limited. BioMed Central 2005-11-23 /pmc/articles/PMC1318491/ /pubmed/16305750 http://dx.doi.org/10.1186/1471-2105-6-278 Text en Copyright © 2005 Rosenberg; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Rosenberg, Michael S
Multiple sequence alignment accuracy and evolutionary distance estimation
title Multiple sequence alignment accuracy and evolutionary distance estimation
title_full Multiple sequence alignment accuracy and evolutionary distance estimation
title_fullStr Multiple sequence alignment accuracy and evolutionary distance estimation
title_full_unstemmed Multiple sequence alignment accuracy and evolutionary distance estimation
title_short Multiple sequence alignment accuracy and evolutionary distance estimation
title_sort multiple sequence alignment accuracy and evolutionary distance estimation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1318491/
https://www.ncbi.nlm.nih.gov/pubmed/16305750
http://dx.doi.org/10.1186/1471-2105-6-278
work_keys_str_mv AT rosenbergmichaels multiplesequencealignmentaccuracyandevolutionarydistanceestimation