Cargando…
ensembleTax: an R package for determinations of ensemble taxonomic assignments of phylogenetically-informative marker gene sequences
BACKGROUND: High-throughput sequencing of phylogenetically informative marker genes is a widely used method to assess the diversity and composition of microbial communities. Taxonomic assignment of sampled marker gene sequences (referred to as amplicon sequence variants, or ASVs) imparts ecological...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8320524/ https://www.ncbi.nlm.nih.gov/pubmed/34395092 http://dx.doi.org/10.7717/peerj.11865 |
_version_ | 1783730659637854208 |
---|---|
author | Catlett, Dylan Son, Kevin Liang, Connie |
author_facet | Catlett, Dylan Son, Kevin Liang, Connie |
author_sort | Catlett, Dylan |
collection | PubMed |
description | BACKGROUND: High-throughput sequencing of phylogenetically informative marker genes is a widely used method to assess the diversity and composition of microbial communities. Taxonomic assignment of sampled marker gene sequences (referred to as amplicon sequence variants, or ASVs) imparts ecological significance to these genetic data. To assign taxonomy to an ASV, a taxonomic assignment algorithm compares the ASV to a collection of reference sequences (a reference database) with known taxonomic affiliations. However, many taxonomic assignment algorithms and reference databases are available, and the optimal algorithm and database for a particular scientific question is often unclear. Here, we present the ensembleTax R package, which provides an efficient framework for integrating taxonomic assignments predicted with any number of taxonomic assignment algorithms and reference databases to determine ensemble taxonomic assignments for ASVs. METHODS: The ensembleTax R package relies on two core algorithms: taxmapper and assign.ensembleTax. The taxmapper algorithm maps taxonomic assignments derived from one reference database onto the taxonomic nomenclature (a set of taxonomic naming and ranking conventions) of another reference database. The assign.ensembleTax algorithm computes ensemble taxonomic assignments for each ASV in a data set based on any number of taxonomic assignments determined with independent methods. Various parameters allow analysts to prioritize obtaining either more ASVs with more predicted clade names or more robust clade name predictions supported by multiple independent methods in ensemble taxonomic assignments. RESULTS: The ensembleTax R package is used to compute two sets of ensemble taxonomic assignments for a collection of protistan ASVs sampled from the coastal ocean. Comparisons of taxonomic assignments predicted by individual methods with those predicted by ensemble methods show that conservative implementations of the ensembleTax package minimize disagreements between taxonomic assignments predicted by individual and ensemble methods, but result in ASVs with fewer ranks assigned taxonomy. Less conservative implementations of the ensembleTax package result in an increased fraction of ASVs classified at all taxonomic ranks, but increase the number of ASVs for which ensemble assignments disagree with those predicted by individual methods. DISCUSSION: We discuss how implementation of the ensembleTax R package may be optimized to address specific scientific objectives based on the results of the application of the ensembleTax package to marine protist communities. While further work is required to evaluate the accuracy of ensemble taxonomic assignments relative to taxonomic assignments predicted by individual methods, we also discuss scenarios where ensemble methods are expected to improve the accuracy of taxonomy prediction for ASVs. |
format | Online Article Text |
id | pubmed-8320524 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-83205242021-08-13 ensembleTax: an R package for determinations of ensemble taxonomic assignments of phylogenetically-informative marker gene sequences Catlett, Dylan Son, Kevin Liang, Connie PeerJ Bioinformatics BACKGROUND: High-throughput sequencing of phylogenetically informative marker genes is a widely used method to assess the diversity and composition of microbial communities. Taxonomic assignment of sampled marker gene sequences (referred to as amplicon sequence variants, or ASVs) imparts ecological significance to these genetic data. To assign taxonomy to an ASV, a taxonomic assignment algorithm compares the ASV to a collection of reference sequences (a reference database) with known taxonomic affiliations. However, many taxonomic assignment algorithms and reference databases are available, and the optimal algorithm and database for a particular scientific question is often unclear. Here, we present the ensembleTax R package, which provides an efficient framework for integrating taxonomic assignments predicted with any number of taxonomic assignment algorithms and reference databases to determine ensemble taxonomic assignments for ASVs. METHODS: The ensembleTax R package relies on two core algorithms: taxmapper and assign.ensembleTax. The taxmapper algorithm maps taxonomic assignments derived from one reference database onto the taxonomic nomenclature (a set of taxonomic naming and ranking conventions) of another reference database. The assign.ensembleTax algorithm computes ensemble taxonomic assignments for each ASV in a data set based on any number of taxonomic assignments determined with independent methods. Various parameters allow analysts to prioritize obtaining either more ASVs with more predicted clade names or more robust clade name predictions supported by multiple independent methods in ensemble taxonomic assignments. RESULTS: The ensembleTax R package is used to compute two sets of ensemble taxonomic assignments for a collection of protistan ASVs sampled from the coastal ocean. Comparisons of taxonomic assignments predicted by individual methods with those predicted by ensemble methods show that conservative implementations of the ensembleTax package minimize disagreements between taxonomic assignments predicted by individual and ensemble methods, but result in ASVs with fewer ranks assigned taxonomy. Less conservative implementations of the ensembleTax package result in an increased fraction of ASVs classified at all taxonomic ranks, but increase the number of ASVs for which ensemble assignments disagree with those predicted by individual methods. DISCUSSION: We discuss how implementation of the ensembleTax R package may be optimized to address specific scientific objectives based on the results of the application of the ensembleTax package to marine protist communities. While further work is required to evaluate the accuracy of ensemble taxonomic assignments relative to taxonomic assignments predicted by individual methods, we also discuss scenarios where ensemble methods are expected to improve the accuracy of taxonomy prediction for ASVs. PeerJ Inc. 2021-07-26 /pmc/articles/PMC8320524/ /pubmed/34395092 http://dx.doi.org/10.7717/peerj.11865 Text en ©2021 Catlett et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Catlett, Dylan Son, Kevin Liang, Connie ensembleTax: an R package for determinations of ensemble taxonomic assignments of phylogenetically-informative marker gene sequences |
title | ensembleTax: an R package for determinations of ensemble taxonomic assignments of phylogenetically-informative marker gene sequences |
title_full | ensembleTax: an R package for determinations of ensemble taxonomic assignments of phylogenetically-informative marker gene sequences |
title_fullStr | ensembleTax: an R package for determinations of ensemble taxonomic assignments of phylogenetically-informative marker gene sequences |
title_full_unstemmed | ensembleTax: an R package for determinations of ensemble taxonomic assignments of phylogenetically-informative marker gene sequences |
title_short | ensembleTax: an R package for determinations of ensemble taxonomic assignments of phylogenetically-informative marker gene sequences |
title_sort | ensembletax: an r package for determinations of ensemble taxonomic assignments of phylogenetically-informative marker gene sequences |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8320524/ https://www.ncbi.nlm.nih.gov/pubmed/34395092 http://dx.doi.org/10.7717/peerj.11865 |
work_keys_str_mv | AT catlettdylan ensembletaxanrpackagefordeterminationsofensembletaxonomicassignmentsofphylogeneticallyinformativemarkergenesequences AT sonkevin ensembletaxanrpackagefordeterminationsofensembletaxonomicassignmentsofphylogeneticallyinformativemarkergenesequences AT liangconnie ensembletaxanrpackagefordeterminationsofensembletaxonomicassignmentsofphylogeneticallyinformativemarkergenesequences |