Cargando…

Does the choice of nucleotide substitution models matter topologically?

BACKGROUND: In the context of a master level programming practical at the computer science department of the Karlsruhe Institute of Technology, we developed and make available an open-source code for testing all 203 possible nucleotide substitution models in the Maximum Likelihood (ML) setting under...

Descripción completa

Detalles Bibliográficos
Autores principales: Hoff, Michael, Orf, Stefan, Riehm, Benedikt, Darriba, Diego, Stamatakis, Alexandros
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4806516/
https://www.ncbi.nlm.nih.gov/pubmed/27009141
http://dx.doi.org/10.1186/s12859-016-0985-x
_version_ 1782423254292496384
author Hoff, Michael
Orf, Stefan
Riehm, Benedikt
Darriba, Diego
Stamatakis, Alexandros
author_facet Hoff, Michael
Orf, Stefan
Riehm, Benedikt
Darriba, Diego
Stamatakis, Alexandros
author_sort Hoff, Michael
collection PubMed
description BACKGROUND: In the context of a master level programming practical at the computer science department of the Karlsruhe Institute of Technology, we developed and make available an open-source code for testing all 203 possible nucleotide substitution models in the Maximum Likelihood (ML) setting under the common Akaike, corrected Akaike, and Bayesian information criteria. We address the question if model selection matters topologically, that is, if conducting ML inferences under the optimal, instead of a standard General Time Reversible model, yields different tree topologies. We also assess, to which degree models selected and trees inferred under the three standard criteria (AIC, AICc, BIC) differ. Finally, we assess if the definition of the sample size (#sites versus #sites × #taxa) yields different models and, as a consequence, different tree topologies. RESULTS: We find that, all three factors (by order of impact: nucleotide model selection, information criterion used, sample size definition) can yield topologically substantially different final tree topologies (topological difference exceeding 10 %) for approximately 5 % of the tree inferences conducted on the 39 empirical datasets used in our study. CONCLUSIONS: We find that, using the best-fit nucleotide substitution model may change the final ML tree topology compared to an inference under a default GTR model. The effect is less pronounced when comparing distinct information criteria. Nonetheless, in some cases we did obtain substantial topological differences. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0985-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4806516
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48065162016-03-25 Does the choice of nucleotide substitution models matter topologically? Hoff, Michael Orf, Stefan Riehm, Benedikt Darriba, Diego Stamatakis, Alexandros BMC Bioinformatics Research Article BACKGROUND: In the context of a master level programming practical at the computer science department of the Karlsruhe Institute of Technology, we developed and make available an open-source code for testing all 203 possible nucleotide substitution models in the Maximum Likelihood (ML) setting under the common Akaike, corrected Akaike, and Bayesian information criteria. We address the question if model selection matters topologically, that is, if conducting ML inferences under the optimal, instead of a standard General Time Reversible model, yields different tree topologies. We also assess, to which degree models selected and trees inferred under the three standard criteria (AIC, AICc, BIC) differ. Finally, we assess if the definition of the sample size (#sites versus #sites × #taxa) yields different models and, as a consequence, different tree topologies. RESULTS: We find that, all three factors (by order of impact: nucleotide model selection, information criterion used, sample size definition) can yield topologically substantially different final tree topologies (topological difference exceeding 10 %) for approximately 5 % of the tree inferences conducted on the 39 empirical datasets used in our study. CONCLUSIONS: We find that, using the best-fit nucleotide substitution model may change the final ML tree topology compared to an inference under a default GTR model. The effect is less pronounced when comparing distinct information criteria. Nonetheless, in some cases we did obtain substantial topological differences. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0985-x) contains supplementary material, which is available to authorized users. BioMed Central 2016-03-24 /pmc/articles/PMC4806516/ /pubmed/27009141 http://dx.doi.org/10.1186/s12859-016-0985-x Text en © Hoffet al. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Hoff, Michael
Orf, Stefan
Riehm, Benedikt
Darriba, Diego
Stamatakis, Alexandros
Does the choice of nucleotide substitution models matter topologically?
title Does the choice of nucleotide substitution models matter topologically?
title_full Does the choice of nucleotide substitution models matter topologically?
title_fullStr Does the choice of nucleotide substitution models matter topologically?
title_full_unstemmed Does the choice of nucleotide substitution models matter topologically?
title_short Does the choice of nucleotide substitution models matter topologically?
title_sort does the choice of nucleotide substitution models matter topologically?
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4806516/
https://www.ncbi.nlm.nih.gov/pubmed/27009141
http://dx.doi.org/10.1186/s12859-016-0985-x
work_keys_str_mv AT hoffmichael doesthechoiceofnucleotidesubstitutionmodelsmattertopologically
AT orfstefan doesthechoiceofnucleotidesubstitutionmodelsmattertopologically
AT riehmbenedikt doesthechoiceofnucleotidesubstitutionmodelsmattertopologically
AT darribadiego doesthechoiceofnucleotidesubstitutionmodelsmattertopologically
AT stamatakisalexandros doesthechoiceofnucleotidesubstitutionmodelsmattertopologically