Cargando…

Comparison of 16S and whole genome dog microbiomes using machine learning

BACKGROUND: Recent advances in sequencing technologies have driven studies identifying the microbiome as a key regulator of overall health and disease in the host. Both 16S amplicon and whole genome shotgun sequencing technologies are currently being used to investigate this relationship, however, t...

Descripción completa

Detalles Bibliográficos
Autores principales: Lewis, Scott, Nash, Andrea, Li, Qinghong, Ahn, Tae-Hyuk
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8379800/
https://www.ncbi.nlm.nih.gov/pubmed/34419136
http://dx.doi.org/10.1186/s13040-021-00270-x
_version_ 1783741081151602688
author Lewis, Scott
Nash, Andrea
Li, Qinghong
Ahn, Tae-Hyuk
author_facet Lewis, Scott
Nash, Andrea
Li, Qinghong
Ahn, Tae-Hyuk
author_sort Lewis, Scott
collection PubMed
description BACKGROUND: Recent advances in sequencing technologies have driven studies identifying the microbiome as a key regulator of overall health and disease in the host. Both 16S amplicon and whole genome shotgun sequencing technologies are currently being used to investigate this relationship, however, the choice of sequencing technology often depends on the nature and experimental design of the study. In principle, the outputs rendered by analysis pipelines are heavily influenced by the data used as input; it is then important to consider that the genomic features produced by different sequencing technologies may emphasize different results. RESULTS: In this work, we use public 16S amplicon and whole genome shotgun sequencing (WGS) data from the same dogs to investigate the relationship between sequencing technology and the captured gut metagenomic landscape in dogs. In our analyses, we compare the taxonomic resolution at the species and phyla levels and benchmark 12 classification algorithms in their ability to accurately identify host phenotype using only taxonomic relative abundance information from 16S and WGS datasets with identical study designs. Our best performing model, a random forest trained by the WGS dataset, identified a species (Bacteroides coprocola) that predominantly contributes to the abundance of leuB, a gene involved in branched chain amino acid biosynthesis; a risk factor for glucose intolerance, insulin resistance, and type 2 diabetes. This trend was not conserved when we trained the model using 16S sequencing profiles from the same dogs. CONCLUSIONS: Our results indicate that WGS sequencing of dog microbiomes detects a greater taxonomic diversity than 16S sequencing of the same dogs at the species level and with respect to four gut-enriched phyla levels. This difference in detection does not significantly impact the performance metrics of machine learning algorithms after down-sampling. Although the important features extracted from our best performing model are not conserved between the two technologies, the important features extracted from either instance indicate the utility of machine learning algorithms in identifying biologically meaningful relationships between the host and microbiome community members. In conclusion, this work provides the first systematic machine learning comparison of dog 16S and WGS microbiomes derived from identical study designs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13040-021-00270-x.
format Online
Article
Text
id pubmed-8379800
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-83798002021-08-23 Comparison of 16S and whole genome dog microbiomes using machine learning Lewis, Scott Nash, Andrea Li, Qinghong Ahn, Tae-Hyuk BioData Min Research BACKGROUND: Recent advances in sequencing technologies have driven studies identifying the microbiome as a key regulator of overall health and disease in the host. Both 16S amplicon and whole genome shotgun sequencing technologies are currently being used to investigate this relationship, however, the choice of sequencing technology often depends on the nature and experimental design of the study. In principle, the outputs rendered by analysis pipelines are heavily influenced by the data used as input; it is then important to consider that the genomic features produced by different sequencing technologies may emphasize different results. RESULTS: In this work, we use public 16S amplicon and whole genome shotgun sequencing (WGS) data from the same dogs to investigate the relationship between sequencing technology and the captured gut metagenomic landscape in dogs. In our analyses, we compare the taxonomic resolution at the species and phyla levels and benchmark 12 classification algorithms in their ability to accurately identify host phenotype using only taxonomic relative abundance information from 16S and WGS datasets with identical study designs. Our best performing model, a random forest trained by the WGS dataset, identified a species (Bacteroides coprocola) that predominantly contributes to the abundance of leuB, a gene involved in branched chain amino acid biosynthesis; a risk factor for glucose intolerance, insulin resistance, and type 2 diabetes. This trend was not conserved when we trained the model using 16S sequencing profiles from the same dogs. CONCLUSIONS: Our results indicate that WGS sequencing of dog microbiomes detects a greater taxonomic diversity than 16S sequencing of the same dogs at the species level and with respect to four gut-enriched phyla levels. This difference in detection does not significantly impact the performance metrics of machine learning algorithms after down-sampling. Although the important features extracted from our best performing model are not conserved between the two technologies, the important features extracted from either instance indicate the utility of machine learning algorithms in identifying biologically meaningful relationships between the host and microbiome community members. In conclusion, this work provides the first systematic machine learning comparison of dog 16S and WGS microbiomes derived from identical study designs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13040-021-00270-x. BioMed Central 2021-08-21 /pmc/articles/PMC8379800/ /pubmed/34419136 http://dx.doi.org/10.1186/s13040-021-00270-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Lewis, Scott
Nash, Andrea
Li, Qinghong
Ahn, Tae-Hyuk
Comparison of 16S and whole genome dog microbiomes using machine learning
title Comparison of 16S and whole genome dog microbiomes using machine learning
title_full Comparison of 16S and whole genome dog microbiomes using machine learning
title_fullStr Comparison of 16S and whole genome dog microbiomes using machine learning
title_full_unstemmed Comparison of 16S and whole genome dog microbiomes using machine learning
title_short Comparison of 16S and whole genome dog microbiomes using machine learning
title_sort comparison of 16s and whole genome dog microbiomes using machine learning
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8379800/
https://www.ncbi.nlm.nih.gov/pubmed/34419136
http://dx.doi.org/10.1186/s13040-021-00270-x
work_keys_str_mv AT lewisscott comparisonof16sandwholegenomedogmicrobiomesusingmachinelearning
AT nashandrea comparisonof16sandwholegenomedogmicrobiomesusingmachinelearning
AT liqinghong comparisonof16sandwholegenomedogmicrobiomesusingmachinelearning
AT ahntaehyuk comparisonof16sandwholegenomedogmicrobiomesusingmachinelearning