Cargando…
Comparison of 16S and whole genome dog microbiomes using machine learning
BACKGROUND: Recent advances in sequencing technologies have driven studies identifying the microbiome as a key regulator of overall health and disease in the host. Both 16S amplicon and whole genome shotgun sequencing technologies are currently being used to investigate this relationship, however, t...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8379800/ https://www.ncbi.nlm.nih.gov/pubmed/34419136 http://dx.doi.org/10.1186/s13040-021-00270-x |
_version_ | 1783741081151602688 |
---|---|
author | Lewis, Scott Nash, Andrea Li, Qinghong Ahn, Tae-Hyuk |
author_facet | Lewis, Scott Nash, Andrea Li, Qinghong Ahn, Tae-Hyuk |
author_sort | Lewis, Scott |
collection | PubMed |
description | BACKGROUND: Recent advances in sequencing technologies have driven studies identifying the microbiome as a key regulator of overall health and disease in the host. Both 16S amplicon and whole genome shotgun sequencing technologies are currently being used to investigate this relationship, however, the choice of sequencing technology often depends on the nature and experimental design of the study. In principle, the outputs rendered by analysis pipelines are heavily influenced by the data used as input; it is then important to consider that the genomic features produced by different sequencing technologies may emphasize different results. RESULTS: In this work, we use public 16S amplicon and whole genome shotgun sequencing (WGS) data from the same dogs to investigate the relationship between sequencing technology and the captured gut metagenomic landscape in dogs. In our analyses, we compare the taxonomic resolution at the species and phyla levels and benchmark 12 classification algorithms in their ability to accurately identify host phenotype using only taxonomic relative abundance information from 16S and WGS datasets with identical study designs. Our best performing model, a random forest trained by the WGS dataset, identified a species (Bacteroides coprocola) that predominantly contributes to the abundance of leuB, a gene involved in branched chain amino acid biosynthesis; a risk factor for glucose intolerance, insulin resistance, and type 2 diabetes. This trend was not conserved when we trained the model using 16S sequencing profiles from the same dogs. CONCLUSIONS: Our results indicate that WGS sequencing of dog microbiomes detects a greater taxonomic diversity than 16S sequencing of the same dogs at the species level and with respect to four gut-enriched phyla levels. This difference in detection does not significantly impact the performance metrics of machine learning algorithms after down-sampling. Although the important features extracted from our best performing model are not conserved between the two technologies, the important features extracted from either instance indicate the utility of machine learning algorithms in identifying biologically meaningful relationships between the host and microbiome community members. In conclusion, this work provides the first systematic machine learning comparison of dog 16S and WGS microbiomes derived from identical study designs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13040-021-00270-x. |
format | Online Article Text |
id | pubmed-8379800 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-83798002021-08-23 Comparison of 16S and whole genome dog microbiomes using machine learning Lewis, Scott Nash, Andrea Li, Qinghong Ahn, Tae-Hyuk BioData Min Research BACKGROUND: Recent advances in sequencing technologies have driven studies identifying the microbiome as a key regulator of overall health and disease in the host. Both 16S amplicon and whole genome shotgun sequencing technologies are currently being used to investigate this relationship, however, the choice of sequencing technology often depends on the nature and experimental design of the study. In principle, the outputs rendered by analysis pipelines are heavily influenced by the data used as input; it is then important to consider that the genomic features produced by different sequencing technologies may emphasize different results. RESULTS: In this work, we use public 16S amplicon and whole genome shotgun sequencing (WGS) data from the same dogs to investigate the relationship between sequencing technology and the captured gut metagenomic landscape in dogs. In our analyses, we compare the taxonomic resolution at the species and phyla levels and benchmark 12 classification algorithms in their ability to accurately identify host phenotype using only taxonomic relative abundance information from 16S and WGS datasets with identical study designs. Our best performing model, a random forest trained by the WGS dataset, identified a species (Bacteroides coprocola) that predominantly contributes to the abundance of leuB, a gene involved in branched chain amino acid biosynthesis; a risk factor for glucose intolerance, insulin resistance, and type 2 diabetes. This trend was not conserved when we trained the model using 16S sequencing profiles from the same dogs. CONCLUSIONS: Our results indicate that WGS sequencing of dog microbiomes detects a greater taxonomic diversity than 16S sequencing of the same dogs at the species level and with respect to four gut-enriched phyla levels. This difference in detection does not significantly impact the performance metrics of machine learning algorithms after down-sampling. Although the important features extracted from our best performing model are not conserved between the two technologies, the important features extracted from either instance indicate the utility of machine learning algorithms in identifying biologically meaningful relationships between the host and microbiome community members. In conclusion, this work provides the first systematic machine learning comparison of dog 16S and WGS microbiomes derived from identical study designs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13040-021-00270-x. BioMed Central 2021-08-21 /pmc/articles/PMC8379800/ /pubmed/34419136 http://dx.doi.org/10.1186/s13040-021-00270-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Lewis, Scott Nash, Andrea Li, Qinghong Ahn, Tae-Hyuk Comparison of 16S and whole genome dog microbiomes using machine learning |
title | Comparison of 16S and whole genome dog microbiomes using machine learning |
title_full | Comparison of 16S and whole genome dog microbiomes using machine learning |
title_fullStr | Comparison of 16S and whole genome dog microbiomes using machine learning |
title_full_unstemmed | Comparison of 16S and whole genome dog microbiomes using machine learning |
title_short | Comparison of 16S and whole genome dog microbiomes using machine learning |
title_sort | comparison of 16s and whole genome dog microbiomes using machine learning |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8379800/ https://www.ncbi.nlm.nih.gov/pubmed/34419136 http://dx.doi.org/10.1186/s13040-021-00270-x |
work_keys_str_mv | AT lewisscott comparisonof16sandwholegenomedogmicrobiomesusingmachinelearning AT nashandrea comparisonof16sandwholegenomedogmicrobiomesusingmachinelearning AT liqinghong comparisonof16sandwholegenomedogmicrobiomesusingmachinelearning AT ahntaehyuk comparisonof16sandwholegenomedogmicrobiomesusingmachinelearning |