Cargando…

Performance of Akaike Information Criterion and Bayesian Information Criterion in Selecting Partition Models and Mixture Models

In molecular phylogenetics, partition models and mixture models provide different approaches to accommodating heterogeneity in genomic sequencing data. Both types of models generally give a superior fit to data than models that assume the process of sequence evolution is homogeneous across sites and...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Qin, Charleston, Michael A, Richards, Shane A, Holland, Barbara R
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Spotlight Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10198649/ https://www.ncbi.nlm.nih.gov/pubmed/36575813 http://dx.doi.org/10.1093/sysbio/syac081

_version_	1785044778282385408
author	Liu, Qin Charleston, Michael A Richards, Shane A Holland, Barbara R
author_facet	Liu, Qin Charleston, Michael A Richards, Shane A Holland, Barbara R
author_sort	Liu, Qin
collection	PubMed
description	In molecular phylogenetics, partition models and mixture models provide different approaches to accommodating heterogeneity in genomic sequencing data. Both types of models generally give a superior fit to data than models that assume the process of sequence evolution is homogeneous across sites and lineages. The Akaike Information Criterion (AIC), an estimator of Kullback–Leibler divergence, and the Bayesian Information Criterion (BIC) are popular tools to select models in phylogenetics. Recent work suggests that AIC should not be used for comparing mixture and partition models. In this work, we clarify that this difficulty is not fully explained by AIC misestimating the Kullback–Leibler divergence. We also investigate the performance of the AIC and BIC at comparing amongst mixture models and amongst partition models. We find that under nonstandard conditions (i.e. when some edges have small expected number of changes), AIC underestimates the expected Kullback–Leibler divergence. Under such conditions, AIC preferred the complex mixture models and BIC preferred the simpler mixture models. The mixture models selected by AIC had a better performance in estimating the edge length, while the simpler models selected by BIC performed better in estimating the base frequencies and substitution rate parameters. In contrast, AIC and BIC both prefer simpler partition models over more complex partition models under nonstandard conditions, despite the fact that the more complex partition model was the generating model. We also investigated how mispartitioning (i.e., grouping sites that have not evolved under the same process) affects both the performance of partition models compared with mixture models and the model selection process. We found that as the level of mispartitioning increases, the bias of AIC in estimating the expected Kullback–Leibler divergence remains the same, and the branch lengths and evolutionary parameters estimated by partition models become less accurate. We recommend that researchers are cautious when using AIC and BIC to select among partition and mixture models; other alternatives, such as cross-validation and bootstrapping, should be explored, but may suffer similar limitations [AIC; BIC; mispartitioning; partitioning; partition model; mixture model].
format	Online Article Text
id	pubmed-10198649
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-101986492023-05-20 Performance of Akaike Information Criterion and Bayesian Information Criterion in Selecting Partition Models and Mixture Models Liu, Qin Charleston, Michael A Richards, Shane A Holland, Barbara R Syst Biol Spotlight Articles In molecular phylogenetics, partition models and mixture models provide different approaches to accommodating heterogeneity in genomic sequencing data. Both types of models generally give a superior fit to data than models that assume the process of sequence evolution is homogeneous across sites and lineages. The Akaike Information Criterion (AIC), an estimator of Kullback–Leibler divergence, and the Bayesian Information Criterion (BIC) are popular tools to select models in phylogenetics. Recent work suggests that AIC should not be used for comparing mixture and partition models. In this work, we clarify that this difficulty is not fully explained by AIC misestimating the Kullback–Leibler divergence. We also investigate the performance of the AIC and BIC at comparing amongst mixture models and amongst partition models. We find that under nonstandard conditions (i.e. when some edges have small expected number of changes), AIC underestimates the expected Kullback–Leibler divergence. Under such conditions, AIC preferred the complex mixture models and BIC preferred the simpler mixture models. The mixture models selected by AIC had a better performance in estimating the edge length, while the simpler models selected by BIC performed better in estimating the base frequencies and substitution rate parameters. In contrast, AIC and BIC both prefer simpler partition models over more complex partition models under nonstandard conditions, despite the fact that the more complex partition model was the generating model. We also investigated how mispartitioning (i.e., grouping sites that have not evolved under the same process) affects both the performance of partition models compared with mixture models and the model selection process. We found that as the level of mispartitioning increases, the bias of AIC in estimating the expected Kullback–Leibler divergence remains the same, and the branch lengths and evolutionary parameters estimated by partition models become less accurate. We recommend that researchers are cautious when using AIC and BIC to select among partition and mixture models; other alternatives, such as cross-validation and bootstrapping, should be explored, but may suffer similar limitations [AIC; BIC; mispartitioning; partitioning; partition model; mixture model]. Oxford University Press 2022-12-28 /pmc/articles/PMC10198649/ /pubmed/36575813 http://dx.doi.org/10.1093/sysbio/syac081 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the Society of Systematic Biologists. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Spotlight Articles Liu, Qin Charleston, Michael A Richards, Shane A Holland, Barbara R Performance of Akaike Information Criterion and Bayesian Information Criterion in Selecting Partition Models and Mixture Models
title	Performance of Akaike Information Criterion and Bayesian Information Criterion in Selecting Partition Models and Mixture Models
title_full	Performance of Akaike Information Criterion and Bayesian Information Criterion in Selecting Partition Models and Mixture Models
title_fullStr	Performance of Akaike Information Criterion and Bayesian Information Criterion in Selecting Partition Models and Mixture Models
title_full_unstemmed	Performance of Akaike Information Criterion and Bayesian Information Criterion in Selecting Partition Models and Mixture Models
title_short	Performance of Akaike Information Criterion and Bayesian Information Criterion in Selecting Partition Models and Mixture Models
title_sort	performance of akaike information criterion and bayesian information criterion in selecting partition models and mixture models
topic	Spotlight Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10198649/ https://www.ncbi.nlm.nih.gov/pubmed/36575813 http://dx.doi.org/10.1093/sysbio/syac081
work_keys_str_mv	AT liuqin performanceofakaikeinformationcriterionandbayesianinformationcriterioninselectingpartitionmodelsandmixturemodels AT charlestonmichaela performanceofakaikeinformationcriterionandbayesianinformationcriterioninselectingpartitionmodelsandmixturemodels AT richardsshanea performanceofakaikeinformationcriterionandbayesianinformationcriterioninselectingpartitionmodelsandmixturemodels AT hollandbarbarar performanceofakaikeinformationcriterionandbayesianinformationcriterioninselectingpartitionmodelsandmixturemodels

Performance of Akaike Information Criterion and Bayesian Information Criterion in Selecting Partition Models and Mixture Models

Ejemplares similares