Cargando…

An investigation of irreproducibility in maximum likelihood phylogenetic inference

Phylogenetic trees are essential for studying biology, but their reproducibility under identical parameter settings remains unexplored. Here, we find that 3515 (18.11%) IQ-TREE-inferred and 1813 (9.34%) RAxML-NG-inferred maximum likelihood (ML) gene trees are topologically irreproducible when execut...

Descripción completa

Detalles Bibliográficos
Autores principales: Shen, Xing-Xing, Li, Yuanning, Hittinger, Chris Todd, Chen, Xue-xin, Rokas, Antonis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7705714/
https://www.ncbi.nlm.nih.gov/pubmed/33257660
http://dx.doi.org/10.1038/s41467-020-20005-6
_version_ 1783617001509355520
author Shen, Xing-Xing
Li, Yuanning
Hittinger, Chris Todd
Chen, Xue-xin
Rokas, Antonis
author_facet Shen, Xing-Xing
Li, Yuanning
Hittinger, Chris Todd
Chen, Xue-xin
Rokas, Antonis
author_sort Shen, Xing-Xing
collection PubMed
description Phylogenetic trees are essential for studying biology, but their reproducibility under identical parameter settings remains unexplored. Here, we find that 3515 (18.11%) IQ-TREE-inferred and 1813 (9.34%) RAxML-NG-inferred maximum likelihood (ML) gene trees are topologically irreproducible when executing two replicates (Run1 and Run2) for each of 19,414 gene alignments in 15 animal, plant, and fungal phylogenomic datasets. Notably, coalescent-based ASTRAL species phylogenies inferred from Run1 and Run2 sets of individual gene trees are topologically irreproducible for 9/15 phylogenomic datasets, whereas concatenation-based phylogenies inferred twice from the same supermatrix are reproducible. Our simulations further show that irreproducible phylogenies are more likely to be incorrect than reproducible phylogenies. These results suggest that a considerable fraction of single-gene ML trees may be irreproducible. Increasing reproducibility in ML inference will benefit from providing analyses’ log files, which contain typically reported parameters (e.g., program, substitution model, number of tree searches) but also typically unreported ones (e.g., random starting seed number, number of threads, processor type).
format Online
Article
Text
id pubmed-7705714
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-77057142020-12-03 An investigation of irreproducibility in maximum likelihood phylogenetic inference Shen, Xing-Xing Li, Yuanning Hittinger, Chris Todd Chen, Xue-xin Rokas, Antonis Nat Commun Article Phylogenetic trees are essential for studying biology, but their reproducibility under identical parameter settings remains unexplored. Here, we find that 3515 (18.11%) IQ-TREE-inferred and 1813 (9.34%) RAxML-NG-inferred maximum likelihood (ML) gene trees are topologically irreproducible when executing two replicates (Run1 and Run2) for each of 19,414 gene alignments in 15 animal, plant, and fungal phylogenomic datasets. Notably, coalescent-based ASTRAL species phylogenies inferred from Run1 and Run2 sets of individual gene trees are topologically irreproducible for 9/15 phylogenomic datasets, whereas concatenation-based phylogenies inferred twice from the same supermatrix are reproducible. Our simulations further show that irreproducible phylogenies are more likely to be incorrect than reproducible phylogenies. These results suggest that a considerable fraction of single-gene ML trees may be irreproducible. Increasing reproducibility in ML inference will benefit from providing analyses’ log files, which contain typically reported parameters (e.g., program, substitution model, number of tree searches) but also typically unreported ones (e.g., random starting seed number, number of threads, processor type). Nature Publishing Group UK 2020-11-30 /pmc/articles/PMC7705714/ /pubmed/33257660 http://dx.doi.org/10.1038/s41467-020-20005-6 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Shen, Xing-Xing
Li, Yuanning
Hittinger, Chris Todd
Chen, Xue-xin
Rokas, Antonis
An investigation of irreproducibility in maximum likelihood phylogenetic inference
title An investigation of irreproducibility in maximum likelihood phylogenetic inference
title_full An investigation of irreproducibility in maximum likelihood phylogenetic inference
title_fullStr An investigation of irreproducibility in maximum likelihood phylogenetic inference
title_full_unstemmed An investigation of irreproducibility in maximum likelihood phylogenetic inference
title_short An investigation of irreproducibility in maximum likelihood phylogenetic inference
title_sort investigation of irreproducibility in maximum likelihood phylogenetic inference
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7705714/
https://www.ncbi.nlm.nih.gov/pubmed/33257660
http://dx.doi.org/10.1038/s41467-020-20005-6
work_keys_str_mv AT shenxingxing aninvestigationofirreproducibilityinmaximumlikelihoodphylogeneticinference
AT liyuanning aninvestigationofirreproducibilityinmaximumlikelihoodphylogeneticinference
AT hittingerchristodd aninvestigationofirreproducibilityinmaximumlikelihoodphylogeneticinference
AT chenxuexin aninvestigationofirreproducibilityinmaximumlikelihoodphylogeneticinference
AT rokasantonis aninvestigationofirreproducibilityinmaximumlikelihoodphylogeneticinference
AT shenxingxing investigationofirreproducibilityinmaximumlikelihoodphylogeneticinference
AT liyuanning investigationofirreproducibilityinmaximumlikelihoodphylogeneticinference
AT hittingerchristodd investigationofirreproducibilityinmaximumlikelihoodphylogeneticinference
AT chenxuexin investigationofirreproducibilityinmaximumlikelihoodphylogeneticinference
AT rokasantonis investigationofirreproducibilityinmaximumlikelihoodphylogeneticinference