Cargando…

16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis

BACKGROUND: The full-length 16S rRNA sequencing can better improve the taxonomic and phylogenetic resolution compared to the partial 16S rRNA gene sequencing. The 16S-FAS-NGS (16S rRNA full-length amplicon sequencing based on a next-generation sequencing platform) technology can generate high-qualit...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Ke, Lin, Rongnan, Chang, Yujun, Zhou, Qing, Zhang, Zhi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9511998/
https://www.ncbi.nlm.nih.gov/pubmed/36172503
http://dx.doi.org/10.7717/peerj.14043
_version_ 1784797759801393152
author Zhang, Ke
Lin, Rongnan
Chang, Yujun
Zhou, Qing
Zhang, Zhi
author_facet Zhang, Ke
Lin, Rongnan
Chang, Yujun
Zhou, Qing
Zhang, Zhi
author_sort Zhang, Ke
collection PubMed
description BACKGROUND: The full-length 16S rRNA sequencing can better improve the taxonomic and phylogenetic resolution compared to the partial 16S rRNA gene sequencing. The 16S-FAS-NGS (16S rRNA full-length amplicon sequencing based on a next-generation sequencing platform) technology can generate high-quality, full-length 16S rRNA gene sequences using short-read sequencers, together with assembly procedures. However there is a lack of a data analysis suite that can help process and analyze the synthetic long read data. RESULTS: Herein, we developed software named 16S-FASAS (16S full-length amplicon sequencing data analysis software) for 16S-FAS-NGS data analysis, which provided high-fidelity species-level microbiome data. 16S-FASAS consists of data quality control, de novo assembly, annotation, and visualization modules. We verified the performance of 16S-FASAS on both mock and fecal samples. In mock communities, we proved that taxonomy assignment by MegaBLAST had fewer misclassifications and tended to find more low abundance species than the USEARCH-UNOISE3-based classifier, resulting in species-level classification of 85.71% (6/7), 85.71% (6/7), 72.72% (8/11), and 70% (7/10) of the target bacteria. When applied to fecal samples, we found that the 16S-FAS-NGS datasets generated contigs grouped into 60 and 56 species, from which 71.62% (43/60) and 76.79% (43/56) were shared with the Pacbio datasets. CONCLUSIONS: 16S-FASAS is a valuable tool that helps researchers process and interpret the results of full-length 16S rRNA gene sequencing. Depending on the full-length amplicon sequencing technology, the 16S-FASAS pipeline enables a more accurate report on the bacterial complexity of microbiome samples. 16S-FASAS is freely available for use at https://github.com/capitalbio-bioinfo/FASAS.
format Online
Article
Text
id pubmed-9511998
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-95119982022-09-27 16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis Zhang, Ke Lin, Rongnan Chang, Yujun Zhou, Qing Zhang, Zhi PeerJ Bioinformatics BACKGROUND: The full-length 16S rRNA sequencing can better improve the taxonomic and phylogenetic resolution compared to the partial 16S rRNA gene sequencing. The 16S-FAS-NGS (16S rRNA full-length amplicon sequencing based on a next-generation sequencing platform) technology can generate high-quality, full-length 16S rRNA gene sequences using short-read sequencers, together with assembly procedures. However there is a lack of a data analysis suite that can help process and analyze the synthetic long read data. RESULTS: Herein, we developed software named 16S-FASAS (16S full-length amplicon sequencing data analysis software) for 16S-FAS-NGS data analysis, which provided high-fidelity species-level microbiome data. 16S-FASAS consists of data quality control, de novo assembly, annotation, and visualization modules. We verified the performance of 16S-FASAS on both mock and fecal samples. In mock communities, we proved that taxonomy assignment by MegaBLAST had fewer misclassifications and tended to find more low abundance species than the USEARCH-UNOISE3-based classifier, resulting in species-level classification of 85.71% (6/7), 85.71% (6/7), 72.72% (8/11), and 70% (7/10) of the target bacteria. When applied to fecal samples, we found that the 16S-FAS-NGS datasets generated contigs grouped into 60 and 56 species, from which 71.62% (43/60) and 76.79% (43/56) were shared with the Pacbio datasets. CONCLUSIONS: 16S-FASAS is a valuable tool that helps researchers process and interpret the results of full-length 16S rRNA gene sequencing. Depending on the full-length amplicon sequencing technology, the 16S-FASAS pipeline enables a more accurate report on the bacterial complexity of microbiome samples. 16S-FASAS is freely available for use at https://github.com/capitalbio-bioinfo/FASAS. PeerJ Inc. 2022-09-23 /pmc/articles/PMC9511998/ /pubmed/36172503 http://dx.doi.org/10.7717/peerj.14043 Text en ©2022 Zhang et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Zhang, Ke
Lin, Rongnan
Chang, Yujun
Zhou, Qing
Zhang, Zhi
16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis
title 16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis
title_full 16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis
title_fullStr 16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis
title_full_unstemmed 16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis
title_short 16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis
title_sort 16s-fasas: an integrated pipeline for synthetic full-length 16s rrna gene sequencing data analysis
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9511998/
https://www.ncbi.nlm.nih.gov/pubmed/36172503
http://dx.doi.org/10.7717/peerj.14043
work_keys_str_mv AT zhangke 16sfasasanintegratedpipelineforsyntheticfulllength16srrnagenesequencingdataanalysis
AT linrongnan 16sfasasanintegratedpipelineforsyntheticfulllength16srrnagenesequencingdataanalysis
AT changyujun 16sfasasanintegratedpipelineforsyntheticfulllength16srrnagenesequencingdataanalysis
AT zhouqing 16sfasasanintegratedpipelineforsyntheticfulllength16srrnagenesequencingdataanalysis
AT zhangzhi 16sfasasanintegratedpipelineforsyntheticfulllength16srrnagenesequencingdataanalysis