Cargando…

Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing

Short-amplicon 16S rRNA gene sequencing is currently the method of choice for studies investigating microbiomes. However, comparative studies on differences in procedures are scarce. We sequenced human stool samples and mock communities with increasing complexity using a variety of commonly used pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Abellan-Schneyder, Isabel, Matchado, Monica S., Reitmeier, Sandra, Sommer, Alina, Sewald, Zeno, Baumbach, Jan, List, Markus, Neuhaus, Klaus
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8544895/
https://www.ncbi.nlm.nih.gov/pubmed/33627512
http://dx.doi.org/10.1128/mSphere.01202-20
_version_ 1784589914163118080
author Abellan-Schneyder, Isabel
Matchado, Monica S.
Reitmeier, Sandra
Sommer, Alina
Sewald, Zeno
Baumbach, Jan
List, Markus
Neuhaus, Klaus
author_facet Abellan-Schneyder, Isabel
Matchado, Monica S.
Reitmeier, Sandra
Sommer, Alina
Sewald, Zeno
Baumbach, Jan
List, Markus
Neuhaus, Klaus
author_sort Abellan-Schneyder, Isabel
collection PubMed
description Short-amplicon 16S rRNA gene sequencing is currently the method of choice for studies investigating microbiomes. However, comparative studies on differences in procedures are scarce. We sequenced human stool samples and mock communities with increasing complexity using a variety of commonly used protocols. Short amplicons targeting different variable regions (V-regions) or ranges thereof (V1-V2, V1-V3, V3-V4, V4, V4-V5, V6-V8, and V7-V9) were investigated for differences in the composition outcome due to primer choices. Next, the influence of clustering (operational taxonomic units [OTUs], zero-radius OTUs [zOTUs], and amplicon sequence variants [ASVs]), different databases (GreenGenes, the Ribosomal Database Project, Silva, the genomic-based 16S rRNA Database, and The All-Species Living Tree), and bioinformatic settings on taxonomic assignment were also investigated. We present a systematic comparison across all typically used V-regions using well-established primers. While it is known that the primer choice has a significant influence on the resulting microbial composition, we show that microbial profiles generated using different primer pairs need independent validation of performance. Further, comparing data sets across V-regions using different databases might be misleading due to differences in nomenclature (e.g., Enterorhabdus versus Adlercreutzia) and varying precisions in classification down to genus level. Overall, specific but important taxa are not picked up by certain primer pairs (e.g., Bacteroidetes is missed using primers 515F-944R) or due to the database used (e.g., Acetatifactor in GreenGenes and the genomic-based 16S rRNA Database). We found that appropriate truncation of amplicons is essential and different truncated-length combinations should be tested for each study. Finally, specific mock communities of sufficient and adequate complexity are highly recommended. IMPORTANCE In 16S rRNA gene sequencing, certain bacterial genera were found to be underrepresented or even missing in taxonomic profiles when using unsuitable primer combinations, outdated reference databases, or inadequate pipeline settings. Concerning the last, quality thresholds as well as bioinformatic settings (i.e., clustering approach, analysis pipeline, and specific adjustments such as truncation) are responsible for a number of observed differences between studies. Conclusions drawn by comparing one data set to another (e.g., between publications) appear to be problematic and require independent cross-validation using matching V-regions and uniform data processing. Therefore, we highlight the importance of a thought-out study design including sufficiently complex mock standards and appropriate V-region choice for the sample of interest. The use of processing pipelines and parameters must be tested beforehand.
format Online
Article
Text
id pubmed-8544895
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-85448952021-10-27 Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing Abellan-Schneyder, Isabel Matchado, Monica S. Reitmeier, Sandra Sommer, Alina Sewald, Zeno Baumbach, Jan List, Markus Neuhaus, Klaus mSphere Research Article Short-amplicon 16S rRNA gene sequencing is currently the method of choice for studies investigating microbiomes. However, comparative studies on differences in procedures are scarce. We sequenced human stool samples and mock communities with increasing complexity using a variety of commonly used protocols. Short amplicons targeting different variable regions (V-regions) or ranges thereof (V1-V2, V1-V3, V3-V4, V4, V4-V5, V6-V8, and V7-V9) were investigated for differences in the composition outcome due to primer choices. Next, the influence of clustering (operational taxonomic units [OTUs], zero-radius OTUs [zOTUs], and amplicon sequence variants [ASVs]), different databases (GreenGenes, the Ribosomal Database Project, Silva, the genomic-based 16S rRNA Database, and The All-Species Living Tree), and bioinformatic settings on taxonomic assignment were also investigated. We present a systematic comparison across all typically used V-regions using well-established primers. While it is known that the primer choice has a significant influence on the resulting microbial composition, we show that microbial profiles generated using different primer pairs need independent validation of performance. Further, comparing data sets across V-regions using different databases might be misleading due to differences in nomenclature (e.g., Enterorhabdus versus Adlercreutzia) and varying precisions in classification down to genus level. Overall, specific but important taxa are not picked up by certain primer pairs (e.g., Bacteroidetes is missed using primers 515F-944R) or due to the database used (e.g., Acetatifactor in GreenGenes and the genomic-based 16S rRNA Database). We found that appropriate truncation of amplicons is essential and different truncated-length combinations should be tested for each study. Finally, specific mock communities of sufficient and adequate complexity are highly recommended. IMPORTANCE In 16S rRNA gene sequencing, certain bacterial genera were found to be underrepresented or even missing in taxonomic profiles when using unsuitable primer combinations, outdated reference databases, or inadequate pipeline settings. Concerning the last, quality thresholds as well as bioinformatic settings (i.e., clustering approach, analysis pipeline, and specific adjustments such as truncation) are responsible for a number of observed differences between studies. Conclusions drawn by comparing one data set to another (e.g., between publications) appear to be problematic and require independent cross-validation using matching V-regions and uniform data processing. Therefore, we highlight the importance of a thought-out study design including sufficiently complex mock standards and appropriate V-region choice for the sample of interest. The use of processing pipelines and parameters must be tested beforehand. American Society for Microbiology 2021-02-24 /pmc/articles/PMC8544895/ /pubmed/33627512 http://dx.doi.org/10.1128/mSphere.01202-20 Text en Copyright © 2021 Abellan-Schneyder et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Abellan-Schneyder, Isabel
Matchado, Monica S.
Reitmeier, Sandra
Sommer, Alina
Sewald, Zeno
Baumbach, Jan
List, Markus
Neuhaus, Klaus
Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing
title Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing
title_full Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing
title_fullStr Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing
title_full_unstemmed Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing
title_short Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing
title_sort primer, pipelines, parameters: issues in 16s rrna gene sequencing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8544895/
https://www.ncbi.nlm.nih.gov/pubmed/33627512
http://dx.doi.org/10.1128/mSphere.01202-20
work_keys_str_mv AT abellanschneyderisabel primerpipelinesparametersissuesin16srrnagenesequencing
AT matchadomonicas primerpipelinesparametersissuesin16srrnagenesequencing
AT reitmeiersandra primerpipelinesparametersissuesin16srrnagenesequencing
AT sommeralina primerpipelinesparametersissuesin16srrnagenesequencing
AT sewaldzeno primerpipelinesparametersissuesin16srrnagenesequencing
AT baumbachjan primerpipelinesparametersissuesin16srrnagenesequencing
AT listmarkus primerpipelinesparametersissuesin16srrnagenesequencing
AT neuhausklaus primerpipelinesparametersissuesin16srrnagenesequencing