Cargando…
AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees
A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to d...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4044049/ https://www.ncbi.nlm.nih.gov/pubmed/24892935 http://dx.doi.org/10.1371/journal.pone.0098844 |
_version_ | 1782319070456053760 |
---|---|
author | Zhou, Chan Mao, Fenglou Yin, Yanbin Huang, Jinling Gogarten, Johann Peter Xu, Ying |
author_facet | Zhou, Chan Mao, Fenglou Yin, Yanbin Huang, Jinling Gogarten, Johann Peter Xu, Ying |
author_sort | Zhou, Chan |
collection | PubMed |
description | A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php. |
format | Online Article Text |
id | pubmed-4044049 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-40440492014-06-09 AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees Zhou, Chan Mao, Fenglou Yin, Yanbin Huang, Jinling Gogarten, Johann Peter Xu, Ying PLoS One Research Article A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php. Public Library of Science 2014-06-03 /pmc/articles/PMC4044049/ /pubmed/24892935 http://dx.doi.org/10.1371/journal.pone.0098844 Text en © 2014 Zhou et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Zhou, Chan Mao, Fenglou Yin, Yanbin Huang, Jinling Gogarten, Johann Peter Xu, Ying AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees |
title | AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees |
title_full | AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees |
title_fullStr | AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees |
title_full_unstemmed | AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees |
title_short | AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees |
title_sort | ast: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4044049/ https://www.ncbi.nlm.nih.gov/pubmed/24892935 http://dx.doi.org/10.1371/journal.pone.0098844 |
work_keys_str_mv | AT zhouchan astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees AT maofenglou astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees AT yinyanbin astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees AT huangjinling astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees AT gogartenjohannpeter astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees AT xuying astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees |