Cargando…

BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities

High-throughput 16S rRNA gene sequencing technologies have robust potential to improve our understanding of bee (Hymenoptera: Apoidea)-associated microbial communities and their impact on hive health and disease. Despite recent computation algorithms now permitting exact inferencing of high-resoluti...

Descripción completa

Detalles Bibliográficos
Autores principales: Daisley, Brendan A., Reid, Gregor
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8546966/
https://www.ncbi.nlm.nih.gov/pubmed/33824193
http://dx.doi.org/10.1128/mSystems.00082-21
_version_ 1784590291959808000
author Daisley, Brendan A.
Reid, Gregor
author_facet Daisley, Brendan A.
Reid, Gregor
author_sort Daisley, Brendan A.
collection PubMed
description High-throughput 16S rRNA gene sequencing technologies have robust potential to improve our understanding of bee (Hymenoptera: Apoidea)-associated microbial communities and their impact on hive health and disease. Despite recent computation algorithms now permitting exact inferencing of high-resolution exact amplicon sequence variants (ASVs), the taxonomic classification of these ASVs remains a challenge due to inadequate reference databases. To address this, we assemble a comprehensive data set of all publicly available bee-associated 16S rRNA gene sequences, systematically annotate poorly resolved identities via inclusion of 618 placeholder labels for uncultivated microbial dark matter, and correct for phylogenetic inconsistencies using a complementary set of distance-based and maximum likelihood correction strategies. To benchmark the resultant database (BEExact), we compare performance against all existing reference databases in silico using a variety of classifier algorithms to produce probabilistic confidence scores. We also validate realistic classification rates on an independent set of ∼234 million short-read sequences derived from 32 studies encompassing 50 different bee types (36 eusocial and 14 solitary). Species-level classification rates on short-read ASVs range from 80 to 90% using BEExact (with ∼20% due to “bxid” placeholder names), whereas only ∼30% at best can be resolved with current universal databases. A series of data-driven recommendations are developed for future studies. We conclude that BEExact (https://github.com/bdaisley/BEExact) enables accurate and standardized microbiota profiling across a broad range of bee species—two factors of key importance to reproducibility and meaningful knowledge exchange within the scientific community that together, can enhance the overall utility and ecological relevance of routine 16S rRNA gene-based sequencing endeavors. IMPORTANCE The failure of current universal taxonomic databases to support the rapidly expanding field of bee microbiota research has led to many investigators relying on “in-house” reference sets or manual classification of sequence reads (usually based on BLAST searches), often with vague identity thresholds and subjective taxonomy choices. This time-consuming, error- and bias-prone process lacks standardization, cripples the potential for comparative cross-study analysis, and in many cases is likely to incorrectly sway study conclusions. BEExact is structured on and leverages several complementary bioinformatic techniques to enable refined inference of bee host-associated microbial communities without any other methodological modifications necessary. It also bridges the gap between current practical outcomes (i.e., phylotype-to-genus level constraints with 97% operational taxonomic units [OTUs]) and the theoretical resolution (i.e., species-to-strain level classification with 100% ASVs) attainable in future microbiota investigations. Other niche habitats could also likely benefit from customized database curation via implementation of the novel approaches introduced in this study.
format Online
Article
Text
id pubmed-8546966
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-85469662021-10-27 BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities Daisley, Brendan A. Reid, Gregor mSystems Methods and Protocols High-throughput 16S rRNA gene sequencing technologies have robust potential to improve our understanding of bee (Hymenoptera: Apoidea)-associated microbial communities and their impact on hive health and disease. Despite recent computation algorithms now permitting exact inferencing of high-resolution exact amplicon sequence variants (ASVs), the taxonomic classification of these ASVs remains a challenge due to inadequate reference databases. To address this, we assemble a comprehensive data set of all publicly available bee-associated 16S rRNA gene sequences, systematically annotate poorly resolved identities via inclusion of 618 placeholder labels for uncultivated microbial dark matter, and correct for phylogenetic inconsistencies using a complementary set of distance-based and maximum likelihood correction strategies. To benchmark the resultant database (BEExact), we compare performance against all existing reference databases in silico using a variety of classifier algorithms to produce probabilistic confidence scores. We also validate realistic classification rates on an independent set of ∼234 million short-read sequences derived from 32 studies encompassing 50 different bee types (36 eusocial and 14 solitary). Species-level classification rates on short-read ASVs range from 80 to 90% using BEExact (with ∼20% due to “bxid” placeholder names), whereas only ∼30% at best can be resolved with current universal databases. A series of data-driven recommendations are developed for future studies. We conclude that BEExact (https://github.com/bdaisley/BEExact) enables accurate and standardized microbiota profiling across a broad range of bee species—two factors of key importance to reproducibility and meaningful knowledge exchange within the scientific community that together, can enhance the overall utility and ecological relevance of routine 16S rRNA gene-based sequencing endeavors. IMPORTANCE The failure of current universal taxonomic databases to support the rapidly expanding field of bee microbiota research has led to many investigators relying on “in-house” reference sets or manual classification of sequence reads (usually based on BLAST searches), often with vague identity thresholds and subjective taxonomy choices. This time-consuming, error- and bias-prone process lacks standardization, cripples the potential for comparative cross-study analysis, and in many cases is likely to incorrectly sway study conclusions. BEExact is structured on and leverages several complementary bioinformatic techniques to enable refined inference of bee host-associated microbial communities without any other methodological modifications necessary. It also bridges the gap between current practical outcomes (i.e., phylotype-to-genus level constraints with 97% operational taxonomic units [OTUs]) and the theoretical resolution (i.e., species-to-strain level classification with 100% ASVs) attainable in future microbiota investigations. Other niche habitats could also likely benefit from customized database curation via implementation of the novel approaches introduced in this study. American Society for Microbiology 2021-04-06 /pmc/articles/PMC8546966/ /pubmed/33824193 http://dx.doi.org/10.1128/mSystems.00082-21 Text en Copyright © 2021 Daisley and Reid. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Methods and Protocols
Daisley, Brendan A.
Reid, Gregor
BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities
title BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities
title_full BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities
title_fullStr BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities
title_full_unstemmed BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities
title_short BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities
title_sort beexact: a metataxonomic database tool for high-resolution inference of bee-associated microbial communities
topic Methods and Protocols
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8546966/
https://www.ncbi.nlm.nih.gov/pubmed/33824193
http://dx.doi.org/10.1128/mSystems.00082-21
work_keys_str_mv AT daisleybrendana beexactametataxonomicdatabasetoolforhighresolutioninferenceofbeeassociatedmicrobialcommunities
AT reidgregor beexactametataxonomicdatabasetoolforhighresolutioninferenceofbeeassociatedmicrobialcommunities