Cargando…

Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance

BACKGROUND: Constructing species trees from multi-copy gene trees remains a challenging problem in phylogenetics. One difficulty is that the underlying genes can be incongruent due to evolutionary processes such as gene duplication and loss, deep coalescence, or lateral gene transfer. Gene tree esti...

Descripción completa

Detalles Bibliográficos
Autores principales: Chaudhary, Ruchi, Burleigh, John Gordon, Fernández-Baca, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3874668/
https://www.ncbi.nlm.nih.gov/pubmed/24180377
http://dx.doi.org/10.1186/1748-7188-8-28
_version_ 1782297260340543488
author Chaudhary, Ruchi
Burleigh, John Gordon
Fernández-Baca, David
author_facet Chaudhary, Ruchi
Burleigh, John Gordon
Fernández-Baca, David
author_sort Chaudhary, Ruchi
collection PubMed
description BACKGROUND: Constructing species trees from multi-copy gene trees remains a challenging problem in phylogenetics. One difficulty is that the underlying genes can be incongruent due to evolutionary processes such as gene duplication and loss, deep coalescence, or lateral gene transfer. Gene tree estimation errors may further exacerbate the difficulties of species tree estimation. RESULTS: We present a new approach for inferring species trees from incongruent multi-copy gene trees that is based on a generalization of the Robinson-Foulds (RF) distance measure to multi-labeled trees (mul-trees). We prove that it is NP-hard to compute the RF distance between two mul-trees; however, it is easy to calculate this distance between a mul-tree and a singly-labeled species tree. Motivated by this, we formulate the RF problem for mul-trees (MulRF) as follows: Given a collection of multi-copy gene trees, find a singly-labeled species tree that minimizes the total RF distance from the input mul-trees. We develop and implement a fast SPR-based heuristic algorithm for the NP-hard MulRF problem. We compare the performance of the MulRF method (available at http://genome.cs.iastate.edu/CBL/MulRF/) with several gene tree parsimony approaches using gene tree simulations that incorporate gene tree error, gene duplications and losses, and/or lateral transfer. The MulRF method produces more accurate species trees than gene tree parsimony approaches. We also demonstrate that the MulRF method infers in minutes a credible plant species tree from a collection of nearly 2,000 gene trees. CONCLUSIONS: Our new phylogenetic inference method, based on a generalized RF distance, makes it possible to quickly estimate species trees from large genomic data sets. Since the MulRF method, unlike gene tree parsimony, is based on a generic tree distance measure, it is appealing for analyses of genomic data sets, in which many processes such as deep coalescence, recombination, gene duplication and losses as well as phylogenetic error may contribute to gene tree discord. In experiments, the MulRF method estimated species trees accurately and quickly, demonstrating MulRF as an efficient alternative approach for phylogenetic inference from large-scale genomic data sets.
format Online
Article
Text
id pubmed-3874668
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38746682013-12-31 Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance Chaudhary, Ruchi Burleigh, John Gordon Fernández-Baca, David Algorithms Mol Biol Research BACKGROUND: Constructing species trees from multi-copy gene trees remains a challenging problem in phylogenetics. One difficulty is that the underlying genes can be incongruent due to evolutionary processes such as gene duplication and loss, deep coalescence, or lateral gene transfer. Gene tree estimation errors may further exacerbate the difficulties of species tree estimation. RESULTS: We present a new approach for inferring species trees from incongruent multi-copy gene trees that is based on a generalization of the Robinson-Foulds (RF) distance measure to multi-labeled trees (mul-trees). We prove that it is NP-hard to compute the RF distance between two mul-trees; however, it is easy to calculate this distance between a mul-tree and a singly-labeled species tree. Motivated by this, we formulate the RF problem for mul-trees (MulRF) as follows: Given a collection of multi-copy gene trees, find a singly-labeled species tree that minimizes the total RF distance from the input mul-trees. We develop and implement a fast SPR-based heuristic algorithm for the NP-hard MulRF problem. We compare the performance of the MulRF method (available at http://genome.cs.iastate.edu/CBL/MulRF/) with several gene tree parsimony approaches using gene tree simulations that incorporate gene tree error, gene duplications and losses, and/or lateral transfer. The MulRF method produces more accurate species trees than gene tree parsimony approaches. We also demonstrate that the MulRF method infers in minutes a credible plant species tree from a collection of nearly 2,000 gene trees. CONCLUSIONS: Our new phylogenetic inference method, based on a generalized RF distance, makes it possible to quickly estimate species trees from large genomic data sets. Since the MulRF method, unlike gene tree parsimony, is based on a generic tree distance measure, it is appealing for analyses of genomic data sets, in which many processes such as deep coalescence, recombination, gene duplication and losses as well as phylogenetic error may contribute to gene tree discord. In experiments, the MulRF method estimated species trees accurately and quickly, demonstrating MulRF as an efficient alternative approach for phylogenetic inference from large-scale genomic data sets. BioMed Central 2013-11-01 /pmc/articles/PMC3874668/ /pubmed/24180377 http://dx.doi.org/10.1186/1748-7188-8-28 Text en Copyright © 2013 Chaudhary et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Chaudhary, Ruchi
Burleigh, John Gordon
Fernández-Baca, David
Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance
title Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance
title_full Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance
title_fullStr Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance
title_full_unstemmed Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance
title_short Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance
title_sort inferring species trees from incongruent multi-copy gene trees using the robinson-foulds distance
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3874668/
https://www.ncbi.nlm.nih.gov/pubmed/24180377
http://dx.doi.org/10.1186/1748-7188-8-28
work_keys_str_mv AT chaudharyruchi inferringspeciestreesfromincongruentmulticopygenetreesusingtherobinsonfouldsdistance
AT burleighjohngordon inferringspeciestreesfromincongruentmulticopygenetreesusingtherobinsonfouldsdistance
AT fernandezbacadavid inferringspeciestreesfromincongruentmulticopygenetreesusingtherobinsonfouldsdistance