Cargando…

Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models

BACKGROUND: Genomic data provide a wealth of new information for phylogenetic analysis. Yet making use of this data requires phylogenetic methods that can efficiently analyze extremely large data sets and account for processes of gene evolution, such as gene duplication and loss, incomplete lineage...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bansal, Mukul S, Burleigh, J Gordon, Eulenstein, Oliver
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009515/ https://www.ncbi.nlm.nih.gov/pubmed/20122216 http://dx.doi.org/10.1186/1471-2105-11-S1-S42

_version_	1782194696466989056
author	Bansal, Mukul S Burleigh, J Gordon Eulenstein, Oliver
author_facet	Bansal, Mukul S Burleigh, J Gordon Eulenstein, Oliver
author_sort	Bansal, Mukul S
collection	PubMed
description	BACKGROUND: Genomic data provide a wealth of new information for phylogenetic analysis. Yet making use of this data requires phylogenetic methods that can efficiently analyze extremely large data sets and account for processes of gene evolution, such as gene duplication and loss, incomplete lineage sorting (deep coalescence), or horizontal gene transfer, that cause incongruence among gene trees. One such approach is gene tree parsimony, which, given a set of gene trees, seeks a species tree that requires the smallest number of evolutionary events to explain the incongruence of the gene trees. However, the only existing algorithms for gene tree parsimony under the duplication-loss or deep coalescence reconciliation cost are prohibitively slow for large datasets. RESULTS: We describe novel algorithms for SPR and TBR based local search heuristics under the duplication-loss cost, and we show how they can be adapted for the deep coalescence cost. These algorithms improve upon the best existing algorithms for these problems by a factor of n, where n is the number of species in the collection of gene trees. We implemented our new SPR based local search algorithm for the duplication-loss cost and demonstrate the tremendous improvement in runtime and scalability it provides compared to existing implementations. We also evaluate the performance of our algorithm on three large-scale genomic data sets. CONCLUSION: Our new algorithms enable, for the first time, gene tree parsimony analyses of thousands of genes from hundreds of taxa using the duplication-loss and deep coalescence reconciliation costs. Thus, this work expands both the size of data sets and the range of evolutionary models that can be incorporated into genome-scale phylogenetic analyses.
format	Text
id	pubmed-3009515
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30095152010-12-23 Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models Bansal, Mukul S Burleigh, J Gordon Eulenstein, Oliver BMC Bioinformatics Research BACKGROUND: Genomic data provide a wealth of new information for phylogenetic analysis. Yet making use of this data requires phylogenetic methods that can efficiently analyze extremely large data sets and account for processes of gene evolution, such as gene duplication and loss, incomplete lineage sorting (deep coalescence), or horizontal gene transfer, that cause incongruence among gene trees. One such approach is gene tree parsimony, which, given a set of gene trees, seeks a species tree that requires the smallest number of evolutionary events to explain the incongruence of the gene trees. However, the only existing algorithms for gene tree parsimony under the duplication-loss or deep coalescence reconciliation cost are prohibitively slow for large datasets. RESULTS: We describe novel algorithms for SPR and TBR based local search heuristics under the duplication-loss cost, and we show how they can be adapted for the deep coalescence cost. These algorithms improve upon the best existing algorithms for these problems by a factor of n, where n is the number of species in the collection of gene trees. We implemented our new SPR based local search algorithm for the duplication-loss cost and demonstrate the tremendous improvement in runtime and scalability it provides compared to existing implementations. We also evaluate the performance of our algorithm on three large-scale genomic data sets. CONCLUSION: Our new algorithms enable, for the first time, gene tree parsimony analyses of thousands of genes from hundreds of taxa using the duplication-loss and deep coalescence reconciliation costs. Thus, this work expands both the size of data sets and the range of evolutionary models that can be incorporated into genome-scale phylogenetic analyses. BioMed Central 2010-01-18 /pmc/articles/PMC3009515/ /pubmed/20122216 http://dx.doi.org/10.1186/1471-2105-11-S1-S42 Text en Copyright ©2010 Bansal et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Bansal, Mukul S Burleigh, J Gordon Eulenstein, Oliver Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models
title	Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models
title_full	Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models
title_fullStr	Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models
title_full_unstemmed	Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models
title_short	Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models
title_sort	efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009515/ https://www.ncbi.nlm.nih.gov/pubmed/20122216 http://dx.doi.org/10.1186/1471-2105-11-S1-S42
work_keys_str_mv	AT bansalmukuls efficientgenomescalephylogeneticanalysisundertheduplicationlossanddeepcoalescencecostmodels AT burleighjgordon efficientgenomescalephylogeneticanalysisundertheduplicationlossanddeepcoalescencecostmodels AT eulensteinoliver efficientgenomescalephylogeneticanalysisundertheduplicationlossanddeepcoalescencecostmodels

Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models

Ejemplares similares