Cargando…

Amplicon Sequence Variants Artificially Split Bacterial Genomes into Separate Clusters

Amplicon sequencing variants (ASVs) have been proposed as an alternative to operational taxonomic units (OTUs) for analyzing microbial communities. ASVs have grown in popularity, in part because of a desire to reflect a more refined level of taxonomy since they do not cluster sequences based on a di...

Descripción completa

Detalles Bibliográficos
Autor principal: Schloss, Patrick D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8386465/
https://www.ncbi.nlm.nih.gov/pubmed/34287003
http://dx.doi.org/10.1128/mSphere.00191-21
_version_ 1783742268393390080
author Schloss, Patrick D.
author_facet Schloss, Patrick D.
author_sort Schloss, Patrick D.
collection PubMed
description Amplicon sequencing variants (ASVs) have been proposed as an alternative to operational taxonomic units (OTUs) for analyzing microbial communities. ASVs have grown in popularity, in part because of a desire to reflect a more refined level of taxonomy since they do not cluster sequences based on a distance-based threshold. However, ASVs and the use of overly narrow thresholds to identify OTUs increase the risk of splitting a single genome into separate clusters. To assess this risk, I analyzed the intragenomic variation of 16S rRNA genes from the bacterial genomes represented in an rrn copy number database, which contained 20,427 genomes from 5,972 species. As the number of copies of the 16S rRNA gene increased in a genome, the number of ASVs also increased. There was an average of 0.58 ASVs per copy of the 16S rRNA gene for full-length 16S rRNA genes. It was necessary to use a distance threshold of 5.25% to cluster full-length ASVs from the same genome into a single OTU with 95% confidence for genomes with 7 copies of the 16S rRNA, such as Escherichia coli. This research highlights the risk of splitting a single bacterial genome into separate clusters when ASVs are used to analyze 16S rRNA gene sequence data. Although there is also a risk of clustering ASVs from different species into the same OTU when using broad distance thresholds, these risks are of less concern than artificially splitting a genome into separate ASVs and OTUs. IMPORTANCE 16S rRNA gene sequencing has engendered significant interest in studying microbial communities. There has been tension between trying to classify 16S rRNA gene sequences to increasingly lower taxonomic levels and the reality that those levels were defined using more sequence and physiological information than is available from a fragment of the 16S rRNA gene. Furthermore, the naming of bacterial taxa reflects the biases of those who name them. One motivation for the recent push to adopt ASVs in place of OTUs in microbial community analyses is to allow researchers to perform their analyses at the finest possible level that reflects species-level taxonomy. The current research is significant because it quantifies the risk of artificially splitting bacterial genomes into separate clusters. Far from providing a better representation of bacterial taxonomy and biology, the ASV approach can lead to conflicting inferences about the ecology of different ASVs from the same genome.
format Online
Article
Text
id pubmed-8386465
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-83864652021-09-09 Amplicon Sequence Variants Artificially Split Bacterial Genomes into Separate Clusters Schloss, Patrick D. mSphere Observation Amplicon sequencing variants (ASVs) have been proposed as an alternative to operational taxonomic units (OTUs) for analyzing microbial communities. ASVs have grown in popularity, in part because of a desire to reflect a more refined level of taxonomy since they do not cluster sequences based on a distance-based threshold. However, ASVs and the use of overly narrow thresholds to identify OTUs increase the risk of splitting a single genome into separate clusters. To assess this risk, I analyzed the intragenomic variation of 16S rRNA genes from the bacterial genomes represented in an rrn copy number database, which contained 20,427 genomes from 5,972 species. As the number of copies of the 16S rRNA gene increased in a genome, the number of ASVs also increased. There was an average of 0.58 ASVs per copy of the 16S rRNA gene for full-length 16S rRNA genes. It was necessary to use a distance threshold of 5.25% to cluster full-length ASVs from the same genome into a single OTU with 95% confidence for genomes with 7 copies of the 16S rRNA, such as Escherichia coli. This research highlights the risk of splitting a single bacterial genome into separate clusters when ASVs are used to analyze 16S rRNA gene sequence data. Although there is also a risk of clustering ASVs from different species into the same OTU when using broad distance thresholds, these risks are of less concern than artificially splitting a genome into separate ASVs and OTUs. IMPORTANCE 16S rRNA gene sequencing has engendered significant interest in studying microbial communities. There has been tension between trying to classify 16S rRNA gene sequences to increasingly lower taxonomic levels and the reality that those levels were defined using more sequence and physiological information than is available from a fragment of the 16S rRNA gene. Furthermore, the naming of bacterial taxa reflects the biases of those who name them. One motivation for the recent push to adopt ASVs in place of OTUs in microbial community analyses is to allow researchers to perform their analyses at the finest possible level that reflects species-level taxonomy. The current research is significant because it quantifies the risk of artificially splitting bacterial genomes into separate clusters. Far from providing a better representation of bacterial taxonomy and biology, the ASV approach can lead to conflicting inferences about the ecology of different ASVs from the same genome. American Society for Microbiology 2021-07-21 /pmc/articles/PMC8386465/ /pubmed/34287003 http://dx.doi.org/10.1128/mSphere.00191-21 Text en Copyright © 2021 Schloss. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Observation
Schloss, Patrick D.
Amplicon Sequence Variants Artificially Split Bacterial Genomes into Separate Clusters
title Amplicon Sequence Variants Artificially Split Bacterial Genomes into Separate Clusters
title_full Amplicon Sequence Variants Artificially Split Bacterial Genomes into Separate Clusters
title_fullStr Amplicon Sequence Variants Artificially Split Bacterial Genomes into Separate Clusters
title_full_unstemmed Amplicon Sequence Variants Artificially Split Bacterial Genomes into Separate Clusters
title_short Amplicon Sequence Variants Artificially Split Bacterial Genomes into Separate Clusters
title_sort amplicon sequence variants artificially split bacterial genomes into separate clusters
topic Observation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8386465/
https://www.ncbi.nlm.nih.gov/pubmed/34287003
http://dx.doi.org/10.1128/mSphere.00191-21
work_keys_str_mv AT schlosspatrickd ampliconsequencevariantsartificiallysplitbacterialgenomesintoseparateclusters