Cargando…

Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences

BACKGROUND: Shotgun metagenomic sequencing reveals the potential in microbial communities. However, lower-cost 16S ribosomal RNA (rRNA) gene sequencing provides taxonomic, not functional, observations. To remedy this, we previously introduced Piphillin, a software package that predicts functional me...

Descripción completa

Detalles Bibliográficos
Autores principales: Narayan, Nicole R., Weinmaier, Thomas, Laserna-Mendieta, Emilio J., Claesson, Marcus J., Shanahan, Fergus, Dabbagh, Karim, Iwai, Shoko, DeSantis, Todd Z.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6967091/
https://www.ncbi.nlm.nih.gov/pubmed/31952477
http://dx.doi.org/10.1186/s12864-019-6427-1
_version_ 1783488878827536384
author Narayan, Nicole R.
Weinmaier, Thomas
Laserna-Mendieta, Emilio J.
Claesson, Marcus J.
Shanahan, Fergus
Dabbagh, Karim
Iwai, Shoko
DeSantis, Todd Z.
author_facet Narayan, Nicole R.
Weinmaier, Thomas
Laserna-Mendieta, Emilio J.
Claesson, Marcus J.
Shanahan, Fergus
Dabbagh, Karim
Iwai, Shoko
DeSantis, Todd Z.
author_sort Narayan, Nicole R.
collection PubMed
description BACKGROUND: Shotgun metagenomic sequencing reveals the potential in microbial communities. However, lower-cost 16S ribosomal RNA (rRNA) gene sequencing provides taxonomic, not functional, observations. To remedy this, we previously introduced Piphillin, a software package that predicts functional metagenomic content based on the frequency of detected 16S rRNA gene sequences corresponding to genomes in regularly updated, functionally annotated genome databases. Piphillin (and similar tools) have previously been evaluated on 16S rRNA data processed by the clustering of sequences into operational taxonomic units (OTUs). New techniques such as amplicon sequence variant error correction are in increased use, but it is unknown if these techniques perform better in metagenomic content prediction pipelines, or if they should be treated the same as OTU data in respect to optimal pipeline parameters. RESULTS: To evaluate the effect of 16S rRNA sequence analysis method (clustering sequences into OTUs vs amplicon sequence variant error correction into amplicon sequence variants (ASVs)) on the ability of Piphillin to predict functional metagenomic content, we evaluated Piphillin-predicted functional content from 16S rRNA sequence data processed through OTU clustering and error correction into ASVs compared to corresponding shotgun metagenomic data. We show a strong correlation between metagenomic data and Piphillin-predicted functional content resulting from both 16S rRNA sequence analysis methods. Differential abundance testing with Piphillin-predicted functional content exhibited a low false positive rate (< 0.05) while capturing a large fraction of the differentially abundant features resulting from corresponding metagenomic data. However, Piphillin prediction performance was optimal at different cutoff parameters depending on 16S rRNA sequence analysis method. Using data analyzed with amplicon sequence variant error correction, Piphillin outperformed comparable tools, for instance exhibiting 19% greater balanced accuracy and 54% greater precision compared to PICRUSt2. CONCLUSIONS: Our results demonstrate that raw Illumina sequences should be processed for subsequent Piphillin analysis using amplicon sequence variant error correction (with DADA2 or similar methods) and run using a 99% ID cutoff for Piphillin, while sequences generated on platforms other than Illumina should be processed via OTU clustering (e.g., UPARSE) and run using a 96% ID cutoff for Piphillin. Piphillin is publicly available for academic users (Piphillin server. http://piphillin.secondgenome.com/.)
format Online
Article
Text
id pubmed-6967091
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69670912020-01-27 Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences Narayan, Nicole R. Weinmaier, Thomas Laserna-Mendieta, Emilio J. Claesson, Marcus J. Shanahan, Fergus Dabbagh, Karim Iwai, Shoko DeSantis, Todd Z. BMC Genomics Research Article BACKGROUND: Shotgun metagenomic sequencing reveals the potential in microbial communities. However, lower-cost 16S ribosomal RNA (rRNA) gene sequencing provides taxonomic, not functional, observations. To remedy this, we previously introduced Piphillin, a software package that predicts functional metagenomic content based on the frequency of detected 16S rRNA gene sequences corresponding to genomes in regularly updated, functionally annotated genome databases. Piphillin (and similar tools) have previously been evaluated on 16S rRNA data processed by the clustering of sequences into operational taxonomic units (OTUs). New techniques such as amplicon sequence variant error correction are in increased use, but it is unknown if these techniques perform better in metagenomic content prediction pipelines, or if they should be treated the same as OTU data in respect to optimal pipeline parameters. RESULTS: To evaluate the effect of 16S rRNA sequence analysis method (clustering sequences into OTUs vs amplicon sequence variant error correction into amplicon sequence variants (ASVs)) on the ability of Piphillin to predict functional metagenomic content, we evaluated Piphillin-predicted functional content from 16S rRNA sequence data processed through OTU clustering and error correction into ASVs compared to corresponding shotgun metagenomic data. We show a strong correlation between metagenomic data and Piphillin-predicted functional content resulting from both 16S rRNA sequence analysis methods. Differential abundance testing with Piphillin-predicted functional content exhibited a low false positive rate (< 0.05) while capturing a large fraction of the differentially abundant features resulting from corresponding metagenomic data. However, Piphillin prediction performance was optimal at different cutoff parameters depending on 16S rRNA sequence analysis method. Using data analyzed with amplicon sequence variant error correction, Piphillin outperformed comparable tools, for instance exhibiting 19% greater balanced accuracy and 54% greater precision compared to PICRUSt2. CONCLUSIONS: Our results demonstrate that raw Illumina sequences should be processed for subsequent Piphillin analysis using amplicon sequence variant error correction (with DADA2 or similar methods) and run using a 99% ID cutoff for Piphillin, while sequences generated on platforms other than Illumina should be processed via OTU clustering (e.g., UPARSE) and run using a 96% ID cutoff for Piphillin. Piphillin is publicly available for academic users (Piphillin server. http://piphillin.secondgenome.com/.) BioMed Central 2020-01-17 /pmc/articles/PMC6967091/ /pubmed/31952477 http://dx.doi.org/10.1186/s12864-019-6427-1 Text en © The Author(s). 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Narayan, Nicole R.
Weinmaier, Thomas
Laserna-Mendieta, Emilio J.
Claesson, Marcus J.
Shanahan, Fergus
Dabbagh, Karim
Iwai, Shoko
DeSantis, Todd Z.
Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences
title Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences
title_full Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences
title_fullStr Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences
title_full_unstemmed Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences
title_short Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences
title_sort piphillin predicts metagenomic composition and dynamics from dada2-corrected 16s rdna sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6967091/
https://www.ncbi.nlm.nih.gov/pubmed/31952477
http://dx.doi.org/10.1186/s12864-019-6427-1
work_keys_str_mv AT narayannicoler piphillinpredictsmetagenomiccompositionanddynamicsfromdada2corrected16srdnasequences
AT weinmaierthomas piphillinpredictsmetagenomiccompositionanddynamicsfromdada2corrected16srdnasequences
AT lasernamendietaemilioj piphillinpredictsmetagenomiccompositionanddynamicsfromdada2corrected16srdnasequences
AT claessonmarcusj piphillinpredictsmetagenomiccompositionanddynamicsfromdada2corrected16srdnasequences
AT shanahanfergus piphillinpredictsmetagenomiccompositionanddynamicsfromdada2corrected16srdnasequences
AT dabbaghkarim piphillinpredictsmetagenomiccompositionanddynamicsfromdada2corrected16srdnasequences
AT iwaishoko piphillinpredictsmetagenomiccompositionanddynamicsfromdada2corrected16srdnasequences
AT desantistoddz piphillinpredictsmetagenomiccompositionanddynamicsfromdada2corrected16srdnasequences