Cargando…

Incorporation of Data From Multiple Hypervariable Regions when Analyzing Bacterial 16S rRNA Gene Sequencing Data

Short read 16 S rRNA amplicon sequencing is a common technique used in microbiome research. However, inaccuracies in estimated bacterial community composition can occur due to amplification bias of the targeted hypervariable region. A potential solution is to sequence and assess multiple hypervariab...

Descripción completa

Detalles Bibliográficos
Autores principales: Jones, Carli B., White, James R., Ernst, Sarah E., Sfanos, Karen S., Peiffer, Lauren B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9009396/
https://www.ncbi.nlm.nih.gov/pubmed/35432480
http://dx.doi.org/10.3389/fgene.2022.799615
_version_ 1784687259174305792
author Jones, Carli B.
White, James R.
Ernst, Sarah E.
Sfanos, Karen S.
Peiffer, Lauren B.
author_facet Jones, Carli B.
White, James R.
Ernst, Sarah E.
Sfanos, Karen S.
Peiffer, Lauren B.
author_sort Jones, Carli B.
collection PubMed
description Short read 16 S rRNA amplicon sequencing is a common technique used in microbiome research. However, inaccuracies in estimated bacterial community composition can occur due to amplification bias of the targeted hypervariable region. A potential solution is to sequence and assess multiple hypervariable regions in tandem, yet there is currently no consensus as to the appropriate method for analyzing this data. Additionally, there are many sequence analysis resources for data produced from the Illumina platform, but fewer open-source options available for data from the Ion Torrent platform. Herein, we present an analysis pipeline using open-source analysis platforms that integrates data from multiple hypervariable regions and is compatible with data produced from the Ion Torrent platform. We used the ThermoFisher Ion 16 S Metagenomics Kit and a mock community of twenty bacterial strains to assess taxonomic classification of six amplicons from separate hypervariable regions (V2, V3, V4, V6-7, V8, V9) using our analysis pipeline. We report that different amplicons have different specificities for taxonomic classification, which also has implications for global level analyses such as alpha and beta diversity. Finally, we utilize a generalized linear modeling approach to statistically integrate the results from multiple hypervariable regions and apply this methodology to data from a representative clinical cohort. We conclude that examining sequencing results across multiple hypervariable regions provides more taxonomic information than sequencing across a single region. The data across multiple hypervariable regions can be combined using generalized linear models to enhance the statistical evaluation of overall differences in community structure and relatedness among sample groups.
format Online
Article
Text
id pubmed-9009396
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-90093962022-04-15 Incorporation of Data From Multiple Hypervariable Regions when Analyzing Bacterial 16S rRNA Gene Sequencing Data Jones, Carli B. White, James R. Ernst, Sarah E. Sfanos, Karen S. Peiffer, Lauren B. Front Genet Genetics Short read 16 S rRNA amplicon sequencing is a common technique used in microbiome research. However, inaccuracies in estimated bacterial community composition can occur due to amplification bias of the targeted hypervariable region. A potential solution is to sequence and assess multiple hypervariable regions in tandem, yet there is currently no consensus as to the appropriate method for analyzing this data. Additionally, there are many sequence analysis resources for data produced from the Illumina platform, but fewer open-source options available for data from the Ion Torrent platform. Herein, we present an analysis pipeline using open-source analysis platforms that integrates data from multiple hypervariable regions and is compatible with data produced from the Ion Torrent platform. We used the ThermoFisher Ion 16 S Metagenomics Kit and a mock community of twenty bacterial strains to assess taxonomic classification of six amplicons from separate hypervariable regions (V2, V3, V4, V6-7, V8, V9) using our analysis pipeline. We report that different amplicons have different specificities for taxonomic classification, which also has implications for global level analyses such as alpha and beta diversity. Finally, we utilize a generalized linear modeling approach to statistically integrate the results from multiple hypervariable regions and apply this methodology to data from a representative clinical cohort. We conclude that examining sequencing results across multiple hypervariable regions provides more taxonomic information than sequencing across a single region. The data across multiple hypervariable regions can be combined using generalized linear models to enhance the statistical evaluation of overall differences in community structure and relatedness among sample groups. Frontiers Media S.A. 2022-03-31 /pmc/articles/PMC9009396/ /pubmed/35432480 http://dx.doi.org/10.3389/fgene.2022.799615 Text en Copyright © 2022 Jones, White, Ernst, Sfanos and Peiffer. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Jones, Carli B.
White, James R.
Ernst, Sarah E.
Sfanos, Karen S.
Peiffer, Lauren B.
Incorporation of Data From Multiple Hypervariable Regions when Analyzing Bacterial 16S rRNA Gene Sequencing Data
title Incorporation of Data From Multiple Hypervariable Regions when Analyzing Bacterial 16S rRNA Gene Sequencing Data
title_full Incorporation of Data From Multiple Hypervariable Regions when Analyzing Bacterial 16S rRNA Gene Sequencing Data
title_fullStr Incorporation of Data From Multiple Hypervariable Regions when Analyzing Bacterial 16S rRNA Gene Sequencing Data
title_full_unstemmed Incorporation of Data From Multiple Hypervariable Regions when Analyzing Bacterial 16S rRNA Gene Sequencing Data
title_short Incorporation of Data From Multiple Hypervariable Regions when Analyzing Bacterial 16S rRNA Gene Sequencing Data
title_sort incorporation of data from multiple hypervariable regions when analyzing bacterial 16s rrna gene sequencing data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9009396/
https://www.ncbi.nlm.nih.gov/pubmed/35432480
http://dx.doi.org/10.3389/fgene.2022.799615
work_keys_str_mv AT jonescarlib incorporationofdatafrommultiplehypervariableregionswhenanalyzingbacterial16srrnagenesequencingdata
AT whitejamesr incorporationofdatafrommultiplehypervariableregionswhenanalyzingbacterial16srrnagenesequencingdata
AT ernstsarahe incorporationofdatafrommultiplehypervariableregionswhenanalyzingbacterial16srrnagenesequencingdata
AT sfanoskarens incorporationofdatafrommultiplehypervariableregionswhenanalyzingbacterial16srrnagenesequencingdata
AT peifferlaurenb incorporationofdatafrommultiplehypervariableregionswhenanalyzingbacterial16srrnagenesequencingdata