Cargando…

TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees

BACKGROUND: Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. RESULTS: We propose an automat...

Descripción completa

Detalles Bibliográficos
Autores principales: Mai, Uyen, Mirarab, Siavash
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998883/
https://www.ncbi.nlm.nih.gov/pubmed/29745847
http://dx.doi.org/10.1186/s12864-018-4620-2
_version_ 1783331322070040576
author Mai, Uyen
Mirarab, Siavash
author_facet Mai, Uyen
Mirarab, Siavash
author_sort Mai, Uyen
collection PubMed
description BACKGROUND: Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. RESULTS: We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. CONCLUSIONS: TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4620-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5998883
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59988832018-06-25 TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees Mai, Uyen Mirarab, Siavash BMC Genomics Research BACKGROUND: Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. RESULTS: We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. CONCLUSIONS: TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4620-2) contains supplementary material, which is available to authorized users. BioMed Central 2018-05-08 /pmc/articles/PMC5998883/ /pubmed/29745847 http://dx.doi.org/10.1186/s12864-018-4620-2 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Mai, Uyen
Mirarab, Siavash
TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees
title TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees
title_full TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees
title_fullStr TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees
title_full_unstemmed TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees
title_short TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees
title_sort treeshrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998883/
https://www.ncbi.nlm.nih.gov/pubmed/29745847
http://dx.doi.org/10.1186/s12864-018-4620-2
work_keys_str_mv AT maiuyen treeshrinkfastandaccuratedetectionofoutlierlongbranchesincollectionsofphylogenetictrees
AT mirarabsiavash treeshrinkfastandaccuratedetectionofoutlierlongbranchesincollectionsofphylogenetictrees