Cargando…

Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples

BACKGROUND: Usefulness of next-generation sequencing (NGS) in assessing bacteria associated with oral squamous cell carcinoma (OSCC) has been undermined by inability to classify reads to the species level. OBJECTIVE: The purpose of this study was to develop a robust algorithm for species-level class...

Descripción completa

Detalles Bibliográficos
Autores principales:	Al-Hebshi, Nezar Noor, Nasher, Akram Thabet, Idris, Ali Mohamed, Chen, Tsute
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Co-Action Publishing 2015
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4590409/ https://www.ncbi.nlm.nih.gov/pubmed/26426306 http://dx.doi.org/10.3402/jom.v7.28934

_version_	1782392929301561344
author	Al-Hebshi, Nezar Noor Nasher, Akram Thabet Idris, Ali Mohamed Chen, Tsute
author_facet	Al-Hebshi, Nezar Noor Nasher, Akram Thabet Idris, Ali Mohamed Chen, Tsute
author_sort	Al-Hebshi, Nezar Noor
collection	PubMed
description	BACKGROUND: Usefulness of next-generation sequencing (NGS) in assessing bacteria associated with oral squamous cell carcinoma (OSCC) has been undermined by inability to classify reads to the species level. OBJECTIVE: The purpose of this study was to develop a robust algorithm for species-level classification of NGS reads from oral samples and to pilot test it for profiling bacteria within OSCC tissues. METHODS: Bacterial 16S V1-V3 libraries were prepared from three OSCC DNA samples and sequenced using 454's FLX chemistry. High-quality, well-aligned, and non-chimeric reads ≥350 bp were classified using a novel, multi-stage algorithm that involves matching reads to reference sequences in revised versions of the Human Oral Microbiome Database (HOMD), HOMD extended (HOMDEXT), and Greengene Gold (GGG) at alignment coverage and percentage identity ≥98%, followed by assignment to species level based on top hit reference sequences. Priority was given to hits in HOMD, then HOMDEXT and finally GGG. Unmatched reads were subject to operational taxonomic unit analysis. RESULTS: Nearly, 92.8% of the reads were matched to updated-HOMD 13.2, 1.83% to trusted-HOMDEXT, and 1.36% to modified-GGG. Of all matched reads, 99.6% were classified to species level. A total of 228 species-level taxa were identified, representing 11 phyla; the most abundant were Proteobacteria, Bacteroidetes, Firmicutes, Fusobacteria, and Actinobacteria. Thirty-five species-level taxa were detected in all samples. On average, Prevotella oris, Neisseria flava, Neisseria flavescens/subflava, Fusobacterium nucleatum ss polymorphum, Aggregatibacter segnis, Streptococcus mitis, and Fusobacterium periodontium were the most abundant. Bacteroides fragilis, a species rarely isolated from the oral cavity, was detected in two samples. CONCLUSION: This multi-stage algorithm maximizes the fraction of reads classified to the species level while ensuring reliable classification by giving priority to the human, oral reference set. Applying the algorithm to OSCC samples revealed high diversity. In addition to oral taxa, a number of human, non-oral taxa were also identified, some of which are rarely detected in the oral cavity.
format	Online Article Text
id	pubmed-4590409
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Co-Action Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-45904092015-10-20 Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples Al-Hebshi, Nezar Noor Nasher, Akram Thabet Idris, Ali Mohamed Chen, Tsute J Oral Microbiol Original Article BACKGROUND: Usefulness of next-generation sequencing (NGS) in assessing bacteria associated with oral squamous cell carcinoma (OSCC) has been undermined by inability to classify reads to the species level. OBJECTIVE: The purpose of this study was to develop a robust algorithm for species-level classification of NGS reads from oral samples and to pilot test it for profiling bacteria within OSCC tissues. METHODS: Bacterial 16S V1-V3 libraries were prepared from three OSCC DNA samples and sequenced using 454's FLX chemistry. High-quality, well-aligned, and non-chimeric reads ≥350 bp were classified using a novel, multi-stage algorithm that involves matching reads to reference sequences in revised versions of the Human Oral Microbiome Database (HOMD), HOMD extended (HOMDEXT), and Greengene Gold (GGG) at alignment coverage and percentage identity ≥98%, followed by assignment to species level based on top hit reference sequences. Priority was given to hits in HOMD, then HOMDEXT and finally GGG. Unmatched reads were subject to operational taxonomic unit analysis. RESULTS: Nearly, 92.8% of the reads were matched to updated-HOMD 13.2, 1.83% to trusted-HOMDEXT, and 1.36% to modified-GGG. Of all matched reads, 99.6% were classified to species level. A total of 228 species-level taxa were identified, representing 11 phyla; the most abundant were Proteobacteria, Bacteroidetes, Firmicutes, Fusobacteria, and Actinobacteria. Thirty-five species-level taxa were detected in all samples. On average, Prevotella oris, Neisseria flava, Neisseria flavescens/subflava, Fusobacterium nucleatum ss polymorphum, Aggregatibacter segnis, Streptococcus mitis, and Fusobacterium periodontium were the most abundant. Bacteroides fragilis, a species rarely isolated from the oral cavity, was detected in two samples. CONCLUSION: This multi-stage algorithm maximizes the fraction of reads classified to the species level while ensuring reliable classification by giving priority to the human, oral reference set. Applying the algorithm to OSCC samples revealed high diversity. In addition to oral taxa, a number of human, non-oral taxa were also identified, some of which are rarely detected in the oral cavity. Co-Action Publishing 2015-09-29 /pmc/articles/PMC4590409/ /pubmed/26426306 http://dx.doi.org/10.3402/jom.v7.28934 Text en © 2015 Nezar Noor Al-Hebshi et al. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License, permitting all non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Al-Hebshi, Nezar Noor Nasher, Akram Thabet Idris, Ali Mohamed Chen, Tsute Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples
title	Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples
title_full	Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples
title_fullStr	Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples
title_full_unstemmed	Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples
title_short	Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples
title_sort	robust species taxonomy assignment algorithm for 16s rrna ngs reads: application to oral carcinoma samples
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4590409/ https://www.ncbi.nlm.nih.gov/pubmed/26426306 http://dx.doi.org/10.3402/jom.v7.28934
work_keys_str_mv	AT alhebshinezarnoor robustspeciestaxonomyassignmentalgorithmfor16srrnangsreadsapplicationtooralcarcinomasamples AT nasherakramthabet robustspeciestaxonomyassignmentalgorithmfor16srrnangsreadsapplicationtooralcarcinomasamples AT idrisalimohamed robustspeciestaxonomyassignmentalgorithmfor16srrnangsreadsapplicationtooralcarcinomasamples AT chentsute robustspeciestaxonomyassignmentalgorithmfor16srrnangsreadsapplicationtooralcarcinomasamples

Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples

Ejemplares similares