Cargando…

Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets

BACKGROUND: Explicit evolutionary models are required in maximum-likelihood and Bayesian inference, the two methods that are overwhelmingly used in phylogenetic studies of DNA sequence data. Appropriate selection of nucleotide substitution models is important because the use of incorrect models can...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Arong, Qiao, Huijie, Zhang, Yanzhou, Shi, Weifeng, Ho, Simon YW, Xu, Weijun, Zhang, Aibing, Zhu, Chaodong
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2925852/
https://www.ncbi.nlm.nih.gov/pubmed/20696057
http://dx.doi.org/10.1186/1471-2148-10-242
_version_ 1782185689027182592
author Luo, Arong
Qiao, Huijie
Zhang, Yanzhou
Shi, Weifeng
Ho, Simon YW
Xu, Weijun
Zhang, Aibing
Zhu, Chaodong
author_facet Luo, Arong
Qiao, Huijie
Zhang, Yanzhou
Shi, Weifeng
Ho, Simon YW
Xu, Weijun
Zhang, Aibing
Zhu, Chaodong
author_sort Luo, Arong
collection PubMed
description BACKGROUND: Explicit evolutionary models are required in maximum-likelihood and Bayesian inference, the two methods that are overwhelmingly used in phylogenetic studies of DNA sequence data. Appropriate selection of nucleotide substitution models is important because the use of incorrect models can mislead phylogenetic inference. To better understand the performance of different model-selection criteria, we used 33,600 simulated data sets to analyse the accuracy, precision, dissimilarity, and biases of the hierarchical likelihood-ratio test, Akaike information criterion, Bayesian information criterion, and decision theory. RESULTS: We demonstrate that the Bayesian information criterion and decision theory are the most appropriate model-selection criteria because of their high accuracy and precision. Our results also indicate that in some situations different models are selected by different criteria for the same dataset. Such dissimilarity was the highest between the hierarchical likelihood-ratio test and Akaike information criterion, and lowest between the Bayesian information criterion and decision theory. The hierarchical likelihood-ratio test performed poorly when the true model included a proportion of invariable sites, while the Bayesian information criterion and decision theory generally exhibited similar performance to each other. CONCLUSIONS: Our results indicate that the Bayesian information criterion and decision theory should be preferred for model selection. Together with model-adequacy tests, accurate model selection will serve to improve the reliability of phylogenetic inference and related analyses.
format Text
id pubmed-2925852
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29258522010-08-24 Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets Luo, Arong Qiao, Huijie Zhang, Yanzhou Shi, Weifeng Ho, Simon YW Xu, Weijun Zhang, Aibing Zhu, Chaodong BMC Evol Biol Research Article BACKGROUND: Explicit evolutionary models are required in maximum-likelihood and Bayesian inference, the two methods that are overwhelmingly used in phylogenetic studies of DNA sequence data. Appropriate selection of nucleotide substitution models is important because the use of incorrect models can mislead phylogenetic inference. To better understand the performance of different model-selection criteria, we used 33,600 simulated data sets to analyse the accuracy, precision, dissimilarity, and biases of the hierarchical likelihood-ratio test, Akaike information criterion, Bayesian information criterion, and decision theory. RESULTS: We demonstrate that the Bayesian information criterion and decision theory are the most appropriate model-selection criteria because of their high accuracy and precision. Our results also indicate that in some situations different models are selected by different criteria for the same dataset. Such dissimilarity was the highest between the hierarchical likelihood-ratio test and Akaike information criterion, and lowest between the Bayesian information criterion and decision theory. The hierarchical likelihood-ratio test performed poorly when the true model included a proportion of invariable sites, while the Bayesian information criterion and decision theory generally exhibited similar performance to each other. CONCLUSIONS: Our results indicate that the Bayesian information criterion and decision theory should be preferred for model selection. Together with model-adequacy tests, accurate model selection will serve to improve the reliability of phylogenetic inference and related analyses. BioMed Central 2010-08-09 /pmc/articles/PMC2925852/ /pubmed/20696057 http://dx.doi.org/10.1186/1471-2148-10-242 Text en Copyright ©2010 Luo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Luo, Arong
Qiao, Huijie
Zhang, Yanzhou
Shi, Weifeng
Ho, Simon YW
Xu, Weijun
Zhang, Aibing
Zhu, Chaodong
Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets
title Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets
title_full Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets
title_fullStr Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets
title_full_unstemmed Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets
title_short Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets
title_sort performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2925852/
https://www.ncbi.nlm.nih.gov/pubmed/20696057
http://dx.doi.org/10.1186/1471-2148-10-242
work_keys_str_mv AT luoarong performanceofcriteriaforselectingevolutionarymodelsinphylogeneticsacomprehensivestudybasedonsimulateddatasets
AT qiaohuijie performanceofcriteriaforselectingevolutionarymodelsinphylogeneticsacomprehensivestudybasedonsimulateddatasets
AT zhangyanzhou performanceofcriteriaforselectingevolutionarymodelsinphylogeneticsacomprehensivestudybasedonsimulateddatasets
AT shiweifeng performanceofcriteriaforselectingevolutionarymodelsinphylogeneticsacomprehensivestudybasedonsimulateddatasets
AT hosimonyw performanceofcriteriaforselectingevolutionarymodelsinphylogeneticsacomprehensivestudybasedonsimulateddatasets
AT xuweijun performanceofcriteriaforselectingevolutionarymodelsinphylogeneticsacomprehensivestudybasedonsimulateddatasets
AT zhangaibing performanceofcriteriaforselectingevolutionarymodelsinphylogeneticsacomprehensivestudybasedonsimulateddatasets
AT zhuchaodong performanceofcriteriaforselectingevolutionarymodelsinphylogeneticsacomprehensivestudybasedonsimulateddatasets