Cargando…

Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories

BACKGROUND: Despite recent decreases in the cost of sequencing, shotgun metagenome sequencing remains more expensive compared with 16S rRNA amplicon sequencing. Methods have been developed to predict the functional profiles of microbial communities based on their taxonomic composition. In this study...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sun, Shan, Jones, Roshonda B., Fodor, Anthony A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Short Report
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7118876/ https://www.ncbi.nlm.nih.gov/pubmed/32241293 http://dx.doi.org/10.1186/s40168-020-00815-y

_version_	1783514654377508864
author	Sun, Shan Jones, Roshonda B. Fodor, Anthony A.
author_facet	Sun, Shan Jones, Roshonda B. Fodor, Anthony A.
author_sort	Sun, Shan
collection	PubMed
description	BACKGROUND: Despite recent decreases in the cost of sequencing, shotgun metagenome sequencing remains more expensive compared with 16S rRNA amplicon sequencing. Methods have been developed to predict the functional profiles of microbial communities based on their taxonomic composition. In this study, we evaluated the performance of three commonly used metagenome prediction tools (PICRUSt, PICRUSt2, and Tax4Fun) by comparing the significance of the differential abundance of predicted functional gene profiles to those from shotgun metagenome sequencing across different environments. RESULTS: We selected 7 datasets of human, non-human animal, and environmental (soil) samples that have publicly available 16S rRNA and shotgun metagenome sequences. As we would expect based on previous literature, strong Spearman correlations were observed between predicted gene compositions and gene relative abundance measured with shotgun metagenome sequencing. However, these strong correlations were preserved even when the abundance of genes were permuted across samples. This suggests that simple correlation coefficient is a highly unreliable measure for the performance of metagenome prediction tools. As an alternative, we compared the performance of genes predicted with PICRUSt, PICRUSt2, and Tax4Fun to sequenced metagenome genes in inference models associated with metadata within each dataset. With this approach, we found reasonable performance for human datasets, with the metagenome prediction tools performing better for inference on genes related to “housekeeping” functions. However, their performance degraded sharply outside of human datasets when used for inference. CONCLUSION: We conclude that the utility of PICRUSt, PICRUSt2, and Tax4Fun for inference with the default database is likely limited outside of human samples and that development of tools for gene prediction specific to different non-human and environmental samples is warranted.
format	Online Article Text
id	pubmed-7118876
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-71188762020-04-07 Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories Sun, Shan Jones, Roshonda B. Fodor, Anthony A. Microbiome Short Report BACKGROUND: Despite recent decreases in the cost of sequencing, shotgun metagenome sequencing remains more expensive compared with 16S rRNA amplicon sequencing. Methods have been developed to predict the functional profiles of microbial communities based on their taxonomic composition. In this study, we evaluated the performance of three commonly used metagenome prediction tools (PICRUSt, PICRUSt2, and Tax4Fun) by comparing the significance of the differential abundance of predicted functional gene profiles to those from shotgun metagenome sequencing across different environments. RESULTS: We selected 7 datasets of human, non-human animal, and environmental (soil) samples that have publicly available 16S rRNA and shotgun metagenome sequences. As we would expect based on previous literature, strong Spearman correlations were observed between predicted gene compositions and gene relative abundance measured with shotgun metagenome sequencing. However, these strong correlations were preserved even when the abundance of genes were permuted across samples. This suggests that simple correlation coefficient is a highly unreliable measure for the performance of metagenome prediction tools. As an alternative, we compared the performance of genes predicted with PICRUSt, PICRUSt2, and Tax4Fun to sequenced metagenome genes in inference models associated with metadata within each dataset. With this approach, we found reasonable performance for human datasets, with the metagenome prediction tools performing better for inference on genes related to “housekeeping” functions. However, their performance degraded sharply outside of human datasets when used for inference. CONCLUSION: We conclude that the utility of PICRUSt, PICRUSt2, and Tax4Fun for inference with the default database is likely limited outside of human samples and that development of tools for gene prediction specific to different non-human and environmental samples is warranted. BioMed Central 2020-04-02 /pmc/articles/PMC7118876/ /pubmed/32241293 http://dx.doi.org/10.1186/s40168-020-00815-y Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Short Report Sun, Shan Jones, Roshonda B. Fodor, Anthony A. Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories
title	Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories
title_full	Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories
title_fullStr	Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories
title_full_unstemmed	Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories
title_short	Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories
title_sort	inference-based accuracy of metagenome prediction tools varies across sample types and functional categories
topic	Short Report
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7118876/ https://www.ncbi.nlm.nih.gov/pubmed/32241293 http://dx.doi.org/10.1186/s40168-020-00815-y
work_keys_str_mv	AT sunshan inferencebasedaccuracyofmetagenomepredictiontoolsvariesacrosssampletypesandfunctionalcategories AT jonesroshondab inferencebasedaccuracyofmetagenomepredictiontoolsvariesacrosssampletypesandfunctionalcategories AT fodoranthonya inferencebasedaccuracyofmetagenomepredictiontoolsvariesacrosssampletypesandfunctionalcategories

Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories

Ejemplares similares