Cargando…
16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis
BACKGROUND: The full-length 16S rRNA sequencing can better improve the taxonomic and phylogenetic resolution compared to the partial 16S rRNA gene sequencing. The 16S-FAS-NGS (16S rRNA full-length amplicon sequencing based on a next-generation sequencing platform) technology can generate high-qualit...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9511998/ https://www.ncbi.nlm.nih.gov/pubmed/36172503 http://dx.doi.org/10.7717/peerj.14043 |
_version_ | 1784797759801393152 |
---|---|
author | Zhang, Ke Lin, Rongnan Chang, Yujun Zhou, Qing Zhang, Zhi |
author_facet | Zhang, Ke Lin, Rongnan Chang, Yujun Zhou, Qing Zhang, Zhi |
author_sort | Zhang, Ke |
collection | PubMed |
description | BACKGROUND: The full-length 16S rRNA sequencing can better improve the taxonomic and phylogenetic resolution compared to the partial 16S rRNA gene sequencing. The 16S-FAS-NGS (16S rRNA full-length amplicon sequencing based on a next-generation sequencing platform) technology can generate high-quality, full-length 16S rRNA gene sequences using short-read sequencers, together with assembly procedures. However there is a lack of a data analysis suite that can help process and analyze the synthetic long read data. RESULTS: Herein, we developed software named 16S-FASAS (16S full-length amplicon sequencing data analysis software) for 16S-FAS-NGS data analysis, which provided high-fidelity species-level microbiome data. 16S-FASAS consists of data quality control, de novo assembly, annotation, and visualization modules. We verified the performance of 16S-FASAS on both mock and fecal samples. In mock communities, we proved that taxonomy assignment by MegaBLAST had fewer misclassifications and tended to find more low abundance species than the USEARCH-UNOISE3-based classifier, resulting in species-level classification of 85.71% (6/7), 85.71% (6/7), 72.72% (8/11), and 70% (7/10) of the target bacteria. When applied to fecal samples, we found that the 16S-FAS-NGS datasets generated contigs grouped into 60 and 56 species, from which 71.62% (43/60) and 76.79% (43/56) were shared with the Pacbio datasets. CONCLUSIONS: 16S-FASAS is a valuable tool that helps researchers process and interpret the results of full-length 16S rRNA gene sequencing. Depending on the full-length amplicon sequencing technology, the 16S-FASAS pipeline enables a more accurate report on the bacterial complexity of microbiome samples. 16S-FASAS is freely available for use at https://github.com/capitalbio-bioinfo/FASAS. |
format | Online Article Text |
id | pubmed-9511998 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-95119982022-09-27 16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis Zhang, Ke Lin, Rongnan Chang, Yujun Zhou, Qing Zhang, Zhi PeerJ Bioinformatics BACKGROUND: The full-length 16S rRNA sequencing can better improve the taxonomic and phylogenetic resolution compared to the partial 16S rRNA gene sequencing. The 16S-FAS-NGS (16S rRNA full-length amplicon sequencing based on a next-generation sequencing platform) technology can generate high-quality, full-length 16S rRNA gene sequences using short-read sequencers, together with assembly procedures. However there is a lack of a data analysis suite that can help process and analyze the synthetic long read data. RESULTS: Herein, we developed software named 16S-FASAS (16S full-length amplicon sequencing data analysis software) for 16S-FAS-NGS data analysis, which provided high-fidelity species-level microbiome data. 16S-FASAS consists of data quality control, de novo assembly, annotation, and visualization modules. We verified the performance of 16S-FASAS on both mock and fecal samples. In mock communities, we proved that taxonomy assignment by MegaBLAST had fewer misclassifications and tended to find more low abundance species than the USEARCH-UNOISE3-based classifier, resulting in species-level classification of 85.71% (6/7), 85.71% (6/7), 72.72% (8/11), and 70% (7/10) of the target bacteria. When applied to fecal samples, we found that the 16S-FAS-NGS datasets generated contigs grouped into 60 and 56 species, from which 71.62% (43/60) and 76.79% (43/56) were shared with the Pacbio datasets. CONCLUSIONS: 16S-FASAS is a valuable tool that helps researchers process and interpret the results of full-length 16S rRNA gene sequencing. Depending on the full-length amplicon sequencing technology, the 16S-FASAS pipeline enables a more accurate report on the bacterial complexity of microbiome samples. 16S-FASAS is freely available for use at https://github.com/capitalbio-bioinfo/FASAS. PeerJ Inc. 2022-09-23 /pmc/articles/PMC9511998/ /pubmed/36172503 http://dx.doi.org/10.7717/peerj.14043 Text en ©2022 Zhang et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Zhang, Ke Lin, Rongnan Chang, Yujun Zhou, Qing Zhang, Zhi 16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis |
title | 16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis |
title_full | 16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis |
title_fullStr | 16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis |
title_full_unstemmed | 16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis |
title_short | 16S-FASAS: an integrated pipeline for synthetic full-length 16S rRNA gene sequencing data analysis |
title_sort | 16s-fasas: an integrated pipeline for synthetic full-length 16s rrna gene sequencing data analysis |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9511998/ https://www.ncbi.nlm.nih.gov/pubmed/36172503 http://dx.doi.org/10.7717/peerj.14043 |
work_keys_str_mv | AT zhangke 16sfasasanintegratedpipelineforsyntheticfulllength16srrnagenesequencingdataanalysis AT linrongnan 16sfasasanintegratedpipelineforsyntheticfulllength16srrnagenesequencingdataanalysis AT changyujun 16sfasasanintegratedpipelineforsyntheticfulllength16srrnagenesequencingdataanalysis AT zhouqing 16sfasasanintegratedpipelineforsyntheticfulllength16srrnagenesequencingdataanalysis AT zhangzhi 16sfasasanintegratedpipelineforsyntheticfulllength16srrnagenesequencingdataanalysis |