Cargando…
Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories
BACKGROUND: Despite recent decreases in the cost of sequencing, shotgun metagenome sequencing remains more expensive compared with 16S rRNA amplicon sequencing. Methods have been developed to predict the functional profiles of microbial communities based on their taxonomic composition. In this study...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7118876/ https://www.ncbi.nlm.nih.gov/pubmed/32241293 http://dx.doi.org/10.1186/s40168-020-00815-y |
_version_ | 1783514654377508864 |
---|---|
author | Sun, Shan Jones, Roshonda B. Fodor, Anthony A. |
author_facet | Sun, Shan Jones, Roshonda B. Fodor, Anthony A. |
author_sort | Sun, Shan |
collection | PubMed |
description | BACKGROUND: Despite recent decreases in the cost of sequencing, shotgun metagenome sequencing remains more expensive compared with 16S rRNA amplicon sequencing. Methods have been developed to predict the functional profiles of microbial communities based on their taxonomic composition. In this study, we evaluated the performance of three commonly used metagenome prediction tools (PICRUSt, PICRUSt2, and Tax4Fun) by comparing the significance of the differential abundance of predicted functional gene profiles to those from shotgun metagenome sequencing across different environments. RESULTS: We selected 7 datasets of human, non-human animal, and environmental (soil) samples that have publicly available 16S rRNA and shotgun metagenome sequences. As we would expect based on previous literature, strong Spearman correlations were observed between predicted gene compositions and gene relative abundance measured with shotgun metagenome sequencing. However, these strong correlations were preserved even when the abundance of genes were permuted across samples. This suggests that simple correlation coefficient is a highly unreliable measure for the performance of metagenome prediction tools. As an alternative, we compared the performance of genes predicted with PICRUSt, PICRUSt2, and Tax4Fun to sequenced metagenome genes in inference models associated with metadata within each dataset. With this approach, we found reasonable performance for human datasets, with the metagenome prediction tools performing better for inference on genes related to “housekeeping” functions. However, their performance degraded sharply outside of human datasets when used for inference. CONCLUSION: We conclude that the utility of PICRUSt, PICRUSt2, and Tax4Fun for inference with the default database is likely limited outside of human samples and that development of tools for gene prediction specific to different non-human and environmental samples is warranted. |
format | Online Article Text |
id | pubmed-7118876 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-71188762020-04-07 Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories Sun, Shan Jones, Roshonda B. Fodor, Anthony A. Microbiome Short Report BACKGROUND: Despite recent decreases in the cost of sequencing, shotgun metagenome sequencing remains more expensive compared with 16S rRNA amplicon sequencing. Methods have been developed to predict the functional profiles of microbial communities based on their taxonomic composition. In this study, we evaluated the performance of three commonly used metagenome prediction tools (PICRUSt, PICRUSt2, and Tax4Fun) by comparing the significance of the differential abundance of predicted functional gene profiles to those from shotgun metagenome sequencing across different environments. RESULTS: We selected 7 datasets of human, non-human animal, and environmental (soil) samples that have publicly available 16S rRNA and shotgun metagenome sequences. As we would expect based on previous literature, strong Spearman correlations were observed between predicted gene compositions and gene relative abundance measured with shotgun metagenome sequencing. However, these strong correlations were preserved even when the abundance of genes were permuted across samples. This suggests that simple correlation coefficient is a highly unreliable measure for the performance of metagenome prediction tools. As an alternative, we compared the performance of genes predicted with PICRUSt, PICRUSt2, and Tax4Fun to sequenced metagenome genes in inference models associated with metadata within each dataset. With this approach, we found reasonable performance for human datasets, with the metagenome prediction tools performing better for inference on genes related to “housekeeping” functions. However, their performance degraded sharply outside of human datasets when used for inference. CONCLUSION: We conclude that the utility of PICRUSt, PICRUSt2, and Tax4Fun for inference with the default database is likely limited outside of human samples and that development of tools for gene prediction specific to different non-human and environmental samples is warranted. BioMed Central 2020-04-02 /pmc/articles/PMC7118876/ /pubmed/32241293 http://dx.doi.org/10.1186/s40168-020-00815-y Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Short Report Sun, Shan Jones, Roshonda B. Fodor, Anthony A. Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories |
title | Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories |
title_full | Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories |
title_fullStr | Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories |
title_full_unstemmed | Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories |
title_short | Inference-based accuracy of metagenome prediction tools varies across sample types and functional categories |
title_sort | inference-based accuracy of metagenome prediction tools varies across sample types and functional categories |
topic | Short Report |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7118876/ https://www.ncbi.nlm.nih.gov/pubmed/32241293 http://dx.doi.org/10.1186/s40168-020-00815-y |
work_keys_str_mv | AT sunshan inferencebasedaccuracyofmetagenomepredictiontoolsvariesacrosssampletypesandfunctionalcategories AT jonesroshondab inferencebasedaccuracyofmetagenomepredictiontoolsvariesacrosssampletypesandfunctionalcategories AT fodoranthonya inferencebasedaccuracyofmetagenomepredictiontoolsvariesacrosssampletypesandfunctionalcategories |