Cargando…
An investigation of irreproducibility in maximum likelihood phylogenetic inference
Phylogenetic trees are essential for studying biology, but their reproducibility under identical parameter settings remains unexplored. Here, we find that 3515 (18.11%) IQ-TREE-inferred and 1813 (9.34%) RAxML-NG-inferred maximum likelihood (ML) gene trees are topologically irreproducible when execut...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7705714/ https://www.ncbi.nlm.nih.gov/pubmed/33257660 http://dx.doi.org/10.1038/s41467-020-20005-6 |
_version_ | 1783617001509355520 |
---|---|
author | Shen, Xing-Xing Li, Yuanning Hittinger, Chris Todd Chen, Xue-xin Rokas, Antonis |
author_facet | Shen, Xing-Xing Li, Yuanning Hittinger, Chris Todd Chen, Xue-xin Rokas, Antonis |
author_sort | Shen, Xing-Xing |
collection | PubMed |
description | Phylogenetic trees are essential for studying biology, but their reproducibility under identical parameter settings remains unexplored. Here, we find that 3515 (18.11%) IQ-TREE-inferred and 1813 (9.34%) RAxML-NG-inferred maximum likelihood (ML) gene trees are topologically irreproducible when executing two replicates (Run1 and Run2) for each of 19,414 gene alignments in 15 animal, plant, and fungal phylogenomic datasets. Notably, coalescent-based ASTRAL species phylogenies inferred from Run1 and Run2 sets of individual gene trees are topologically irreproducible for 9/15 phylogenomic datasets, whereas concatenation-based phylogenies inferred twice from the same supermatrix are reproducible. Our simulations further show that irreproducible phylogenies are more likely to be incorrect than reproducible phylogenies. These results suggest that a considerable fraction of single-gene ML trees may be irreproducible. Increasing reproducibility in ML inference will benefit from providing analyses’ log files, which contain typically reported parameters (e.g., program, substitution model, number of tree searches) but also typically unreported ones (e.g., random starting seed number, number of threads, processor type). |
format | Online Article Text |
id | pubmed-7705714 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-77057142020-12-03 An investigation of irreproducibility in maximum likelihood phylogenetic inference Shen, Xing-Xing Li, Yuanning Hittinger, Chris Todd Chen, Xue-xin Rokas, Antonis Nat Commun Article Phylogenetic trees are essential for studying biology, but their reproducibility under identical parameter settings remains unexplored. Here, we find that 3515 (18.11%) IQ-TREE-inferred and 1813 (9.34%) RAxML-NG-inferred maximum likelihood (ML) gene trees are topologically irreproducible when executing two replicates (Run1 and Run2) for each of 19,414 gene alignments in 15 animal, plant, and fungal phylogenomic datasets. Notably, coalescent-based ASTRAL species phylogenies inferred from Run1 and Run2 sets of individual gene trees are topologically irreproducible for 9/15 phylogenomic datasets, whereas concatenation-based phylogenies inferred twice from the same supermatrix are reproducible. Our simulations further show that irreproducible phylogenies are more likely to be incorrect than reproducible phylogenies. These results suggest that a considerable fraction of single-gene ML trees may be irreproducible. Increasing reproducibility in ML inference will benefit from providing analyses’ log files, which contain typically reported parameters (e.g., program, substitution model, number of tree searches) but also typically unreported ones (e.g., random starting seed number, number of threads, processor type). Nature Publishing Group UK 2020-11-30 /pmc/articles/PMC7705714/ /pubmed/33257660 http://dx.doi.org/10.1038/s41467-020-20005-6 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Shen, Xing-Xing Li, Yuanning Hittinger, Chris Todd Chen, Xue-xin Rokas, Antonis An investigation of irreproducibility in maximum likelihood phylogenetic inference |
title | An investigation of irreproducibility in maximum likelihood phylogenetic inference |
title_full | An investigation of irreproducibility in maximum likelihood phylogenetic inference |
title_fullStr | An investigation of irreproducibility in maximum likelihood phylogenetic inference |
title_full_unstemmed | An investigation of irreproducibility in maximum likelihood phylogenetic inference |
title_short | An investigation of irreproducibility in maximum likelihood phylogenetic inference |
title_sort | investigation of irreproducibility in maximum likelihood phylogenetic inference |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7705714/ https://www.ncbi.nlm.nih.gov/pubmed/33257660 http://dx.doi.org/10.1038/s41467-020-20005-6 |
work_keys_str_mv | AT shenxingxing aninvestigationofirreproducibilityinmaximumlikelihoodphylogeneticinference AT liyuanning aninvestigationofirreproducibilityinmaximumlikelihoodphylogeneticinference AT hittingerchristodd aninvestigationofirreproducibilityinmaximumlikelihoodphylogeneticinference AT chenxuexin aninvestigationofirreproducibilityinmaximumlikelihoodphylogeneticinference AT rokasantonis aninvestigationofirreproducibilityinmaximumlikelihoodphylogeneticinference AT shenxingxing investigationofirreproducibilityinmaximumlikelihoodphylogeneticinference AT liyuanning investigationofirreproducibilityinmaximumlikelihoodphylogeneticinference AT hittingerchristodd investigationofirreproducibilityinmaximumlikelihoodphylogeneticinference AT chenxuexin investigationofirreproducibilityinmaximumlikelihoodphylogeneticinference AT rokasantonis investigationofirreproducibilityinmaximumlikelihoodphylogeneticinference |