Cargando…

AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees

A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to d...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Chan, Mao, Fenglou, Yin, Yanbin, Huang, Jinling, Gogarten, Johann Peter, Xu, Ying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4044049/
https://www.ncbi.nlm.nih.gov/pubmed/24892935
http://dx.doi.org/10.1371/journal.pone.0098844
_version_ 1782319070456053760
author Zhou, Chan
Mao, Fenglou
Yin, Yanbin
Huang, Jinling
Gogarten, Johann Peter
Xu, Ying
author_facet Zhou, Chan
Mao, Fenglou
Yin, Yanbin
Huang, Jinling
Gogarten, Johann Peter
Xu, Ying
author_sort Zhou, Chan
collection PubMed
description A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php.
format Online
Article
Text
id pubmed-4044049
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-40440492014-06-09 AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees Zhou, Chan Mao, Fenglou Yin, Yanbin Huang, Jinling Gogarten, Johann Peter Xu, Ying PLoS One Research Article A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php. Public Library of Science 2014-06-03 /pmc/articles/PMC4044049/ /pubmed/24892935 http://dx.doi.org/10.1371/journal.pone.0098844 Text en © 2014 Zhou et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Zhou, Chan
Mao, Fenglou
Yin, Yanbin
Huang, Jinling
Gogarten, Johann Peter
Xu, Ying
AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees
title AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees
title_full AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees
title_fullStr AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees
title_full_unstemmed AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees
title_short AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees
title_sort ast: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4044049/
https://www.ncbi.nlm.nih.gov/pubmed/24892935
http://dx.doi.org/10.1371/journal.pone.0098844
work_keys_str_mv AT zhouchan astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees
AT maofenglou astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees
AT yinyanbin astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees
AT huangjinling astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees
AT gogartenjohannpeter astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees
AT xuying astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees