Cargando…

Reprocessing 16S rRNA Gene Amplicon Sequencing Studies: (Meta)Data Issues, Robustness, and Reproducibility

High-throughput sequencing technology provides an efficient method for evaluating microbial ecology. Different bioinformatics pipelines can be used to convert 16S ribosomal RNA gene amplicon sequencing data into an operational taxonomic unit (OTU) table that is used to analyze microbial communities....

Descripción completa

Detalles Bibliográficos
Autores principales: Kang, Xiongbin, Deng, Dong Mei, Crielaard, Wim, Brandt, Bernd W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8566820/
https://www.ncbi.nlm.nih.gov/pubmed/34746021
http://dx.doi.org/10.3389/fcimb.2021.720637
_version_ 1784594098157518848
author Kang, Xiongbin
Deng, Dong Mei
Crielaard, Wim
Brandt, Bernd W.
author_facet Kang, Xiongbin
Deng, Dong Mei
Crielaard, Wim
Brandt, Bernd W.
author_sort Kang, Xiongbin
collection PubMed
description High-throughput sequencing technology provides an efficient method for evaluating microbial ecology. Different bioinformatics pipelines can be used to convert 16S ribosomal RNA gene amplicon sequencing data into an operational taxonomic unit (OTU) table that is used to analyze microbial communities. It is important to assess the robustness of these pipelines, each with specific algorithms and/or parameters, and their influence on the outcome of statistical tests. Articles with publicly available datasets on the oral microbiome were searched for, and five datasets were retrieved. These were from studies on changes in microbiota related to smoking, oral cancer, caries, diabetes, or periodontitis. Next, the data was processed with four pipelines based on VSEARCH, USEARCH, mothur, and UNOISE3. OTU tables were rarefied, and differences in α-diversity and β-diversity were tested for different groups in a dataset. Finally, these results were checked for consistency among these example pipelines. Of articles that deposited data, only 57% made all sequencing and metadata available. When processing the datasets, issues were encountered, caused by read characteristics and differences between tools and their defaults in combination with a lack of detail in the methodology of the articles. In general, the four mainstream pipelines provided similar results, but importantly, P-values sometimes differed between pipelines beyond the significance threshold. Our results indicated that for published articles, the description of bioinformatics methods and data deposition should be improved, and regarding reproducibility, that analysis of multiple subsamples is required when using rarefying as library-size normalization method.
format Online
Article
Text
id pubmed-8566820
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-85668202021-11-05 Reprocessing 16S rRNA Gene Amplicon Sequencing Studies: (Meta)Data Issues, Robustness, and Reproducibility Kang, Xiongbin Deng, Dong Mei Crielaard, Wim Brandt, Bernd W. Front Cell Infect Microbiol Cellular and Infection Microbiology High-throughput sequencing technology provides an efficient method for evaluating microbial ecology. Different bioinformatics pipelines can be used to convert 16S ribosomal RNA gene amplicon sequencing data into an operational taxonomic unit (OTU) table that is used to analyze microbial communities. It is important to assess the robustness of these pipelines, each with specific algorithms and/or parameters, and their influence on the outcome of statistical tests. Articles with publicly available datasets on the oral microbiome were searched for, and five datasets were retrieved. These were from studies on changes in microbiota related to smoking, oral cancer, caries, diabetes, or periodontitis. Next, the data was processed with four pipelines based on VSEARCH, USEARCH, mothur, and UNOISE3. OTU tables were rarefied, and differences in α-diversity and β-diversity were tested for different groups in a dataset. Finally, these results were checked for consistency among these example pipelines. Of articles that deposited data, only 57% made all sequencing and metadata available. When processing the datasets, issues were encountered, caused by read characteristics and differences between tools and their defaults in combination with a lack of detail in the methodology of the articles. In general, the four mainstream pipelines provided similar results, but importantly, P-values sometimes differed between pipelines beyond the significance threshold. Our results indicated that for published articles, the description of bioinformatics methods and data deposition should be improved, and regarding reproducibility, that analysis of multiple subsamples is required when using rarefying as library-size normalization method. Frontiers Media S.A. 2021-10-21 /pmc/articles/PMC8566820/ /pubmed/34746021 http://dx.doi.org/10.3389/fcimb.2021.720637 Text en Copyright © 2021 Kang, Deng, Crielaard and Brandt https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Cellular and Infection Microbiology
Kang, Xiongbin
Deng, Dong Mei
Crielaard, Wim
Brandt, Bernd W.
Reprocessing 16S rRNA Gene Amplicon Sequencing Studies: (Meta)Data Issues, Robustness, and Reproducibility
title Reprocessing 16S rRNA Gene Amplicon Sequencing Studies: (Meta)Data Issues, Robustness, and Reproducibility
title_full Reprocessing 16S rRNA Gene Amplicon Sequencing Studies: (Meta)Data Issues, Robustness, and Reproducibility
title_fullStr Reprocessing 16S rRNA Gene Amplicon Sequencing Studies: (Meta)Data Issues, Robustness, and Reproducibility
title_full_unstemmed Reprocessing 16S rRNA Gene Amplicon Sequencing Studies: (Meta)Data Issues, Robustness, and Reproducibility
title_short Reprocessing 16S rRNA Gene Amplicon Sequencing Studies: (Meta)Data Issues, Robustness, and Reproducibility
title_sort reprocessing 16s rrna gene amplicon sequencing studies: (meta)data issues, robustness, and reproducibility
topic Cellular and Infection Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8566820/
https://www.ncbi.nlm.nih.gov/pubmed/34746021
http://dx.doi.org/10.3389/fcimb.2021.720637
work_keys_str_mv AT kangxiongbin reprocessing16srrnageneampliconsequencingstudiesmetadataissuesrobustnessandreproducibility
AT dengdongmei reprocessing16srrnageneampliconsequencingstudiesmetadataissuesrobustnessandreproducibility
AT crielaardwim reprocessing16srrnageneampliconsequencingstudiesmetadataissuesrobustnessandreproducibility
AT brandtberndw reprocessing16srrnageneampliconsequencingstudiesmetadataissuesrobustnessandreproducibility