Cargando…

Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling

The constant accumulation of sequence data poses new computational and methodological challenges for phylogenetic inference, since multiple sequence alignments grow both in the horizontal (number of base pairs, phylogenomic alignments) as well as vertical (number of taxa) dimension. Put aside the on...

Descripción completa

Detalles Bibliográficos
Autores principales:	Stamatakis, Alexandros, Göker, Markus, Grimm, Guido W.
Formato:	Texto
Lenguaje:	English
Publicado:	Libertas Academica 2010
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2880847/ https://www.ncbi.nlm.nih.gov/pubmed/20535232

_version_	1782182056737898496
author	Stamatakis, Alexandros Göker, Markus Grimm, Guido W.
author_facet	Stamatakis, Alexandros Göker, Markus Grimm, Guido W.
author_sort	Stamatakis, Alexandros
collection	PubMed
description	The constant accumulation of sequence data poses new computational and methodological challenges for phylogenetic inference, since multiple sequence alignments grow both in the horizontal (number of base pairs, phylogenomic alignments) as well as vertical (number of taxa) dimension. Put aside the ongoing controversial discussion about appropriate models, partitioning schemes, and assembly methods for phylogenomic alignments, coupled with the high computational cost to infer these, for many organismic groups, a sufficient number of taxa is often exclusively available from one or just a few genes (e.g., rbcL, matK, rDNA). In this paper we address scalability of Maximum-Likelihood-based phylogeny reconstruction with respect to the number of taxa by example of several large nested single-gene rbcL alignments comprising 400 up to 3,491 taxa. In order to test the effect of taxon sampling, we employ an appropriately adapted taxon jackknifing approach. In contrast to standard jackknifing, this taxon subsampling procedure is not conducted entirely at random, but based on drawing subsamples from empirical taxon-groups which can either be user-defined or determined by using taxonomic information from databases. Our results indicate that, despite an unfavorable number of sequences to number of base pairs ratio, i.e., many relatively short sequences, Maximum Likelihood tree searches and bootstrap analyses scale well on single-gene rbcL alignments with a dense taxon sampling up to several thousand sequences. Moreover, the newly implemented taxon subsampling procedure can be beneficial for inferring higher level relationships and interpreting bootstrap support from comprehensive analysis.
format	Text
id	pubmed-2880847
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Libertas Academica
record_format	MEDLINE/PubMed
spelling	pubmed-28808472010-06-09 Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling Stamatakis, Alexandros Göker, Markus Grimm, Guido W. Evol Bioinform Online Original Research The constant accumulation of sequence data poses new computational and methodological challenges for phylogenetic inference, since multiple sequence alignments grow both in the horizontal (number of base pairs, phylogenomic alignments) as well as vertical (number of taxa) dimension. Put aside the ongoing controversial discussion about appropriate models, partitioning schemes, and assembly methods for phylogenomic alignments, coupled with the high computational cost to infer these, for many organismic groups, a sufficient number of taxa is often exclusively available from one or just a few genes (e.g., rbcL, matK, rDNA). In this paper we address scalability of Maximum-Likelihood-based phylogeny reconstruction with respect to the number of taxa by example of several large nested single-gene rbcL alignments comprising 400 up to 3,491 taxa. In order to test the effect of taxon sampling, we employ an appropriately adapted taxon jackknifing approach. In contrast to standard jackknifing, this taxon subsampling procedure is not conducted entirely at random, but based on drawing subsamples from empirical taxon-groups which can either be user-defined or determined by using taxonomic information from databases. Our results indicate that, despite an unfavorable number of sequences to number of base pairs ratio, i.e., many relatively short sequences, Maximum Likelihood tree searches and bootstrap analyses scale well on single-gene rbcL alignments with a dense taxon sampling up to several thousand sequences. Moreover, the newly implemented taxon subsampling procedure can be beneficial for inferring higher level relationships and interpreting bootstrap support from comprehensive analysis. Libertas Academica 2010-05-24 /pmc/articles/PMC2880847/ /pubmed/20535232 Text en © 2010 the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited.
spellingShingle	Original Research Stamatakis, Alexandros Göker, Markus Grimm, Guido W. Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling
title	Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling
title_full	Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling
title_fullStr	Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling
title_full_unstemmed	Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling
title_short	Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling
title_sort	maximum likelihood analyses of 3,490 rbcl sequences: scalability of comprehensive inference versus group-specific taxon sampling
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2880847/ https://www.ncbi.nlm.nih.gov/pubmed/20535232
work_keys_str_mv	AT stamatakisalexandros maximumlikelihoodanalysesof3490rbclsequencesscalabilityofcomprehensiveinferenceversusgroupspecifictaxonsampling AT gokermarkus maximumlikelihoodanalysesof3490rbclsequencesscalabilityofcomprehensiveinferenceversusgroupspecifictaxonsampling AT grimmguidow maximumlikelihoodanalysesof3490rbclsequencesscalabilityofcomprehensiveinferenceversusgroupspecifictaxonsampling

Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling

Ejemplares similares