Cargando…

Data-Driven Modeling for Species-Level Taxonomic Assignment From 16S rRNA: Application to Human Microbiomes

With the emergence of next-generation sequencing (NGS) technology, there have been a large number of metagenomic studies that estimated the bacterial composition via 16S ribosomal RNA (16S rRNA) amplicon sequencing. In particular, subsets of the hypervariable regions in 16S rRNA, such as V1–V2 and V...

Descripción completa

Detalles Bibliográficos
Autores principales: Gwak, Ho-Jin, Rho, Mina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7688474/
https://www.ncbi.nlm.nih.gov/pubmed/33262743
http://dx.doi.org/10.3389/fmicb.2020.570825
_version_ 1783613716022951936
author Gwak, Ho-Jin
Rho, Mina
author_facet Gwak, Ho-Jin
Rho, Mina
author_sort Gwak, Ho-Jin
collection PubMed
description With the emergence of next-generation sequencing (NGS) technology, there have been a large number of metagenomic studies that estimated the bacterial composition via 16S ribosomal RNA (16S rRNA) amplicon sequencing. In particular, subsets of the hypervariable regions in 16S rRNA, such as V1–V2 and V3–V4, are targeted using high-throughput sequencing. The sequences from different taxa are assigned to a specific taxon based on the sequence homology. Since such sequences are highly homologous or identical between species in the same genus, it is challenging to determine the exact species using 16S rRNA sequences only. Therefore, in this study, homologous species groups were defined to obtain maximum resolution related with species using 16S rRNA. For the taxonomic assignment using 16S rRNA, three major 16S rRNA databases are independently used since the lineage of certain bacteria is not consistent among these databases. On the basis of the NCBI taxonomy classification, we re-annotated inconsistent lineage information in three major 16S rRNA databases. For each species, we constructed a consensus sequence model for each hypervariable region and determined homologous species groups that consist of indistinguishable species in terms of sequence homology. Using a k-nearest neighbor method and the species consensus sequence models, the species-level taxonomy was determined. If the species determined is a member of homologous species groups, the species group is assigned instead of a specific species. Notably, the results of the evaluation on our method using simulated and mock datasets showed a high correlation with the real bacterial composition. Furthermore, in the analysis of real microbiome samples, such as salivary and gut microbiome samples, our method successfully performed species-level profiling and identified differences in the bacterial composition between different phenotypic groups.
format Online
Article
Text
id pubmed-7688474
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-76884742020-11-30 Data-Driven Modeling for Species-Level Taxonomic Assignment From 16S rRNA: Application to Human Microbiomes Gwak, Ho-Jin Rho, Mina Front Microbiol Microbiology With the emergence of next-generation sequencing (NGS) technology, there have been a large number of metagenomic studies that estimated the bacterial composition via 16S ribosomal RNA (16S rRNA) amplicon sequencing. In particular, subsets of the hypervariable regions in 16S rRNA, such as V1–V2 and V3–V4, are targeted using high-throughput sequencing. The sequences from different taxa are assigned to a specific taxon based on the sequence homology. Since such sequences are highly homologous or identical between species in the same genus, it is challenging to determine the exact species using 16S rRNA sequences only. Therefore, in this study, homologous species groups were defined to obtain maximum resolution related with species using 16S rRNA. For the taxonomic assignment using 16S rRNA, three major 16S rRNA databases are independently used since the lineage of certain bacteria is not consistent among these databases. On the basis of the NCBI taxonomy classification, we re-annotated inconsistent lineage information in three major 16S rRNA databases. For each species, we constructed a consensus sequence model for each hypervariable region and determined homologous species groups that consist of indistinguishable species in terms of sequence homology. Using a k-nearest neighbor method and the species consensus sequence models, the species-level taxonomy was determined. If the species determined is a member of homologous species groups, the species group is assigned instead of a specific species. Notably, the results of the evaluation on our method using simulated and mock datasets showed a high correlation with the real bacterial composition. Furthermore, in the analysis of real microbiome samples, such as salivary and gut microbiome samples, our method successfully performed species-level profiling and identified differences in the bacterial composition between different phenotypic groups. Frontiers Media S.A. 2020-11-12 /pmc/articles/PMC7688474/ /pubmed/33262743 http://dx.doi.org/10.3389/fmicb.2020.570825 Text en Copyright © 2020 Gwak and Rho. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Gwak, Ho-Jin
Rho, Mina
Data-Driven Modeling for Species-Level Taxonomic Assignment From 16S rRNA: Application to Human Microbiomes
title Data-Driven Modeling for Species-Level Taxonomic Assignment From 16S rRNA: Application to Human Microbiomes
title_full Data-Driven Modeling for Species-Level Taxonomic Assignment From 16S rRNA: Application to Human Microbiomes
title_fullStr Data-Driven Modeling for Species-Level Taxonomic Assignment From 16S rRNA: Application to Human Microbiomes
title_full_unstemmed Data-Driven Modeling for Species-Level Taxonomic Assignment From 16S rRNA: Application to Human Microbiomes
title_short Data-Driven Modeling for Species-Level Taxonomic Assignment From 16S rRNA: Application to Human Microbiomes
title_sort data-driven modeling for species-level taxonomic assignment from 16s rrna: application to human microbiomes
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7688474/
https://www.ncbi.nlm.nih.gov/pubmed/33262743
http://dx.doi.org/10.3389/fmicb.2020.570825
work_keys_str_mv AT gwakhojin datadrivenmodelingforspeciesleveltaxonomicassignmentfrom16srrnaapplicationtohumanmicrobiomes
AT rhomina datadrivenmodelingforspeciesleveltaxonomicassignmentfrom16srrnaapplicationtohumanmicrobiomes