Cargando…

The tree alignment problem

BACKGROUND: The inference of homologies among DNA sequences, that is, positions in multiple genomes that share a common evolutionary origin, is a crucial, yet difficult task facing biologists. Its computational counterpart is known as the multiple sequence alignment problem. There are various criter...

Descripción completa

Detalles Bibliográficos
Autores principales:	Varón, Andrés, Wheeler, Ward C
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3605350/ https://www.ncbi.nlm.nih.gov/pubmed/23140486 http://dx.doi.org/10.1186/1471-2105-13-293

_version_	1782263867916681216
author	Varón, Andrés Wheeler, Ward C
author_facet	Varón, Andrés Wheeler, Ward C
author_sort	Varón, Andrés
collection	PubMed
description	BACKGROUND: The inference of homologies among DNA sequences, that is, positions in multiple genomes that share a common evolutionary origin, is a crucial, yet difficult task facing biologists. Its computational counterpart is known as the multiple sequence alignment problem. There are various criteria and methods available to perform multiple sequence alignments, and among these, the minimization of the overall cost of the alignment on a phylogenetic tree is known in combinatorial optimization as the Tree Alignment Problem. This problem typically occurs as a subproblem of the Generalized Tree Alignment Problem, which looks for the tree with the lowest alignment cost among all possible trees. This is equivalent to the Maximum Parsimony problem when the input sequences are not aligned, that is, when phylogeny and alignments are simultaneously inferred. RESULTS: For large data sets, a popular heuristic is Direct Optimization (DO). DO provides a good tradeoff between speed, scalability, and competitive scores, and is implemented in the computer program POY. All other (competitive) algorithms have greater time complexities compared to DO. Here, we introduce and present experiments a new algorithm Affine-DO to accommodate the indel (alignment gap) models commonly used in phylogenetic analysis of molecular sequence data. Affine-DO has the same time complexity as DO, but is correctly suited for the affine gap edit distance. We demonstrate its performance with more than 330,000 experimental tests. These experiments show that the solutions of Affine-DO are close to the lower bound inferred from a linear programming solution. Moreover, iterating over a solution produced using Affine-DO shows little improvement. CONCLUSIONS: Our results show that Affine-DO is likely producing near-optimal solutions, with approximations within 10% for sequences with small divergence, and within 30% for random sequences, for which Affine-DO produced the worst solutions. The Affine-DO algorithm has the necessary scalability and optimality to be a significant improvement in the real-world phylogenetic analysis of sequence data.
format	Online Article Text
id	pubmed-3605350
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-36053502013-03-26 The tree alignment problem Varón, Andrés Wheeler, Ward C BMC Bioinformatics Research Article BACKGROUND: The inference of homologies among DNA sequences, that is, positions in multiple genomes that share a common evolutionary origin, is a crucial, yet difficult task facing biologists. Its computational counterpart is known as the multiple sequence alignment problem. There are various criteria and methods available to perform multiple sequence alignments, and among these, the minimization of the overall cost of the alignment on a phylogenetic tree is known in combinatorial optimization as the Tree Alignment Problem. This problem typically occurs as a subproblem of the Generalized Tree Alignment Problem, which looks for the tree with the lowest alignment cost among all possible trees. This is equivalent to the Maximum Parsimony problem when the input sequences are not aligned, that is, when phylogeny and alignments are simultaneously inferred. RESULTS: For large data sets, a popular heuristic is Direct Optimization (DO). DO provides a good tradeoff between speed, scalability, and competitive scores, and is implemented in the computer program POY. All other (competitive) algorithms have greater time complexities compared to DO. Here, we introduce and present experiments a new algorithm Affine-DO to accommodate the indel (alignment gap) models commonly used in phylogenetic analysis of molecular sequence data. Affine-DO has the same time complexity as DO, but is correctly suited for the affine gap edit distance. We demonstrate its performance with more than 330,000 experimental tests. These experiments show that the solutions of Affine-DO are close to the lower bound inferred from a linear programming solution. Moreover, iterating over a solution produced using Affine-DO shows little improvement. CONCLUSIONS: Our results show that Affine-DO is likely producing near-optimal solutions, with approximations within 10% for sequences with small divergence, and within 30% for random sequences, for which Affine-DO produced the worst solutions. The Affine-DO algorithm has the necessary scalability and optimality to be a significant improvement in the real-world phylogenetic analysis of sequence data. BioMed Central 2012-11-09 /pmc/articles/PMC3605350/ /pubmed/23140486 http://dx.doi.org/10.1186/1471-2105-13-293 Text en Copyright ©2012 Varón and Wheeler; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Varón, Andrés Wheeler, Ward C The tree alignment problem
title	The tree alignment problem
title_full	The tree alignment problem
title_fullStr	The tree alignment problem
title_full_unstemmed	The tree alignment problem
title_short	The tree alignment problem
title_sort	tree alignment problem
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3605350/ https://www.ncbi.nlm.nih.gov/pubmed/23140486 http://dx.doi.org/10.1186/1471-2105-13-293
work_keys_str_mv	AT varonandres thetreealignmentproblem AT wheelerwardc thetreealignmentproblem AT varonandres treealignmentproblem AT wheelerwardc treealignmentproblem

The tree alignment problem

Ejemplares similares