Cargando…

Taming the Selection of Optimal Substitution Models in Phylogenomics by Site Subsampling and Upsampling

The selection of the optimal substitution model of molecular evolution imposes a high computational burden for long sequence alignments in phylogenomics. We discovered that the analysis of multiple tiny subsamples of site patterns from a full sequence alignment recovers the correct optimal substitut...

Descripción completa

Detalles Bibliográficos
Autores principales: Sharma, Sudip, Kumar, Sudhir
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9665063/
https://www.ncbi.nlm.nih.gov/pubmed/36306418
http://dx.doi.org/10.1093/molbev/msac236
_version_ 1784831213975896064
author Sharma, Sudip
Kumar, Sudhir
author_facet Sharma, Sudip
Kumar, Sudhir
author_sort Sharma, Sudip
collection PubMed
description The selection of the optimal substitution model of molecular evolution imposes a high computational burden for long sequence alignments in phylogenomics. We discovered that the analysis of multiple tiny subsamples of site patterns from a full sequence alignment recovers the correct optimal substitution model when sites in the subsample are upsampled to match the total number of sites in the full alignment. The computational costs of maximum-likelihood analyses are reduced by orders of magnitude in the subsample–upsample (SU) approach because the upsampled alignment contains only a small fraction of all site patterns. We present an adaptive protocol, ModelTamer, that implements the new SU approach and automatically selects subsamples to estimate optimal models reliably. ModelTamer selects models hundreds to thousands of times faster than the full data analysis while needing megabytes rather than gigabytes of computer memory.
format Online
Article
Text
id pubmed-9665063
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-96650632022-11-14 Taming the Selection of Optimal Substitution Models in Phylogenomics by Site Subsampling and Upsampling Sharma, Sudip Kumar, Sudhir Mol Biol Evol Methods The selection of the optimal substitution model of molecular evolution imposes a high computational burden for long sequence alignments in phylogenomics. We discovered that the analysis of multiple tiny subsamples of site patterns from a full sequence alignment recovers the correct optimal substitution model when sites in the subsample are upsampled to match the total number of sites in the full alignment. The computational costs of maximum-likelihood analyses are reduced by orders of magnitude in the subsample–upsample (SU) approach because the upsampled alignment contains only a small fraction of all site patterns. We present an adaptive protocol, ModelTamer, that implements the new SU approach and automatically selects subsamples to estimate optimal models reliably. ModelTamer selects models hundreds to thousands of times faster than the full data analysis while needing megabytes rather than gigabytes of computer memory. Oxford University Press 2022-10-28 /pmc/articles/PMC9665063/ /pubmed/36306418 http://dx.doi.org/10.1093/molbev/msac236 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods
Sharma, Sudip
Kumar, Sudhir
Taming the Selection of Optimal Substitution Models in Phylogenomics by Site Subsampling and Upsampling
title Taming the Selection of Optimal Substitution Models in Phylogenomics by Site Subsampling and Upsampling
title_full Taming the Selection of Optimal Substitution Models in Phylogenomics by Site Subsampling and Upsampling
title_fullStr Taming the Selection of Optimal Substitution Models in Phylogenomics by Site Subsampling and Upsampling
title_full_unstemmed Taming the Selection of Optimal Substitution Models in Phylogenomics by Site Subsampling and Upsampling
title_short Taming the Selection of Optimal Substitution Models in Phylogenomics by Site Subsampling and Upsampling
title_sort taming the selection of optimal substitution models in phylogenomics by site subsampling and upsampling
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9665063/
https://www.ncbi.nlm.nih.gov/pubmed/36306418
http://dx.doi.org/10.1093/molbev/msac236
work_keys_str_mv AT sharmasudip tamingtheselectionofoptimalsubstitutionmodelsinphylogenomicsbysitesubsamplingandupsampling
AT kumarsudhir tamingtheselectionofoptimalsubstitutionmodelsinphylogenomicsbysitesubsamplingandupsampling