Cargando…

Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories

BACKGROUND: Despite recent decreases in the cost of sequencing, shotgun metagenome sequencing remains more expensive compared with 16S rRNA amplicon sequencing. Methods have been developed to predict the functional profiles of microbial communities based on their taxonomic composition. In this study...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Shan, Jones, Roshonda B., Fodor, Anthony A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7118876/
https://www.ncbi.nlm.nih.gov/pubmed/32241293
http://dx.doi.org/10.1186/s40168-020-00815-y
_version_ 1783514654377508864
author Sun, Shan
Jones, Roshonda B.
Fodor, Anthony A.
author_facet Sun, Shan
Jones, Roshonda B.
Fodor, Anthony A.
author_sort Sun, Shan
collection PubMed
description BACKGROUND: Despite recent decreases in the cost of sequencing, shotgun metagenome sequencing remains more expensive compared with 16S rRNA amplicon sequencing. Methods have been developed to predict the functional profiles of microbial communities based on their taxonomic composition. In this study, we evaluated the performance of three commonly used metagenome prediction tools (PICRUSt, PICRUSt2, and Tax4Fun) by comparing the significance of the differential abundance of predicted functional gene profiles to those from shotgun metagenome sequencing across different environments. RESULTS: We selected 7 datasets of human, non-human animal, and environmental (soil) samples that have publicly available 16S rRNA and shotgun metagenome sequences. As we would expect based on previous literature, strong Spearman correlations were observed between predicted gene compositions and gene relative abundance measured with shotgun metagenome sequencing. However, these strong correlations were preserved even when the abundance of genes were permuted across samples. This suggests that simple correlation coefficient is a highly unreliable measure for the performance of metagenome prediction tools. As an alternative, we compared the performance of genes predicted with PICRUSt, PICRUSt2, and Tax4Fun to sequenced metagenome genes in inference models associated with metadata within each dataset. With this approach, we found reasonable performance for human datasets, with the metagenome prediction tools performing better for inference on genes related to “housekeeping” functions. However, their performance degraded sharply outside of human datasets when used for inference. CONCLUSION: We conclude that the utility of PICRUSt, PICRUSt2, and Tax4Fun for inference with the default database is likely limited outside of human samples and that development of tools for gene prediction specific to different non-human and environmental samples is warranted.
format Online
Article
Text
id pubmed-7118876
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-71188762020-04-07 Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories Sun, Shan Jones, Roshonda B. Fodor, Anthony A. Microbiome Short Report BACKGROUND: Despite recent decreases in the cost of sequencing, shotgun metagenome sequencing remains more expensive compared with 16S rRNA amplicon sequencing. Methods have been developed to predict the functional profiles of microbial communities based on their taxonomic composition. In this study, we evaluated the performance of three commonly used metagenome prediction tools (PICRUSt, PICRUSt2, and Tax4Fun) by comparing the significance of the differential abundance of predicted functional gene profiles to those from shotgun metagenome sequencing across different environments. RESULTS: We selected 7 datasets of human, non-human animal, and environmental (soil) samples that have publicly available 16S rRNA and shotgun metagenome sequences. As we would expect based on previous literature, strong Spearman correlations were observed between predicted gene compositions and gene relative abundance measured with shotgun metagenome sequencing. However, these strong correlations were preserved even when the abundance of genes were permuted across samples. This suggests that simple correlation coefficient is a highly unreliable measure for the performance of metagenome prediction tools. As an alternative, we compared the performance of genes predicted with PICRUSt, PICRUSt2, and Tax4Fun to sequenced metagenome genes in inference models associated with metadata within each dataset. With this approach, we found reasonable performance for human datasets, with the metagenome prediction tools performing better for inference on genes related to “housekeeping” functions. However, their performance degraded sharply outside of human datasets when used for inference. CONCLUSION: We conclude that the utility of PICRUSt, PICRUSt2, and Tax4Fun for inference with the default database is likely limited outside of human samples and that development of tools for gene prediction specific to different non-human and environmental samples is warranted. BioMed Central 2020-04-02 /pmc/articles/PMC7118876/ /pubmed/32241293 http://dx.doi.org/10.1186/s40168-020-00815-y Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Short Report
Sun, Shan
Jones, Roshonda B.
Fodor, Anthony A.
Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories
title Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories
title_full Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories
title_fullStr Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories
title_full_unstemmed Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories
title_short Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories
title_sort inference-based accuracy of metagenome prediction tools varies across sample types and functional categories
topic Short Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7118876/
https://www.ncbi.nlm.nih.gov/pubmed/32241293
http://dx.doi.org/10.1186/s40168-020-00815-y
work_keys_str_mv AT sunshan inferencebasedaccuracyofmetagenomepredictiontoolsvariesacrosssampletypesandfunctionalcategories
AT jonesroshondab inferencebasedaccuracyofmetagenomepredictiontoolsvariesacrosssampletypesandfunctionalcategories
AT fodoranthonya inferencebasedaccuracyofmetagenomepredictiontoolsvariesacrosssampletypesandfunctionalcategories