Cargando…

Conditional Forest Models Built Using Metagenomic Data Accurately Predicted Salmonella Contamination in Northeastern Streams

The use of water contaminated with Salmonella for produce production contributes to foodborne disease burden. To reduce human health risks, there is a need for novel, targeted approaches for assessing the pathogen status of agricultural water. We investigated the utility of water microbiome data for...

Descripción completa

Detalles Bibliográficos
Autores principales: Chung, Taejung, Yan, Runan, Weller, Daniel L., Kovac, Jasna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10100987/
https://www.ncbi.nlm.nih.gov/pubmed/36946722
http://dx.doi.org/10.1128/spectrum.00381-23
_version_ 1785025408373096448
author Chung, Taejung
Yan, Runan
Weller, Daniel L.
Kovac, Jasna
author_facet Chung, Taejung
Yan, Runan
Weller, Daniel L.
Kovac, Jasna
author_sort Chung, Taejung
collection PubMed
description The use of water contaminated with Salmonella for produce production contributes to foodborne disease burden. To reduce human health risks, there is a need for novel, targeted approaches for assessing the pathogen status of agricultural water. We investigated the utility of water microbiome data for predicting Salmonella contamination of streams used to source water for produce production. Grab samples were collected from 60 New York streams in 2018 and tested for Salmonella. Separately, DNA was extracted from the samples and used for Illumina shotgun metagenomic sequencing. Reads were trimmed and used to assign taxonomy with Kraken2. Conditional forest (CF), regularized random forest (RRF), and support vector machine (SVM) models were implemented to predict Salmonella contamination. Model performance was assessed using 10-fold cross-validation repeated 10 times to quantify area under the curve (AUC) and Kappa score. CF models outperformed the other two algorithms based on AUC (0.86, CF; 0.81, RRF; 0.65, SVM) and Kappa score (0.53, CF; 0.41, RRF; 0.12, SVM). The taxa that were most informative for accurately predicting Salmonella contamination based on CF were compared to taxa identified by ALDEx2 as being differentially abundant between Salmonella-positive and -negative samples. CF and differential abundance tests both identified Aeromonas salmonicida (variable importance [VI] = 0.012) and Aeromonas sp. strain CA23 (VI = 0.025) as the two most informative taxa for predicting Salmonella contamination. Our findings suggest that microbiome-based models may provide an alternative to or complement existing water monitoring strategies. Similarly, the informative taxa identified in this study warrant further investigation as potential indicators of Salmonella contamination of agricultural water. IMPORTANCE Understanding the associations between surface water microbiome composition and the presence of foodborne pathogens, such as Salmonella, can facilitate the identification of novel indicators of Salmonella contamination. This study assessed the utility of microbiome data and three machine learning algorithms for predicting Salmonella contamination of Northeastern streams. The research reported here both expanded the knowledge on the microbiome composition of surface waters and identified putative novel indicators (i.e., Aeromonas species) for Salmonella in Northeastern streams. These putative indicators warrant further research to assess whether they are consistent indicators of Salmonella contamination across regions, waterways, and years not represented in the data set used in this study. Validated indicators identified using microbiome data may be used as targets in the development of rapid (e.g., PCR-based) detection assays for the assessment of microbial safety of agricultural surface waters.
format Online
Article
Text
id pubmed-10100987
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-101009872023-04-14 Conditional Forest Models Built Using Metagenomic Data Accurately Predicted Salmonella Contamination in Northeastern Streams Chung, Taejung Yan, Runan Weller, Daniel L. Kovac, Jasna Microbiol Spectr Research Article The use of water contaminated with Salmonella for produce production contributes to foodborne disease burden. To reduce human health risks, there is a need for novel, targeted approaches for assessing the pathogen status of agricultural water. We investigated the utility of water microbiome data for predicting Salmonella contamination of streams used to source water for produce production. Grab samples were collected from 60 New York streams in 2018 and tested for Salmonella. Separately, DNA was extracted from the samples and used for Illumina shotgun metagenomic sequencing. Reads were trimmed and used to assign taxonomy with Kraken2. Conditional forest (CF), regularized random forest (RRF), and support vector machine (SVM) models were implemented to predict Salmonella contamination. Model performance was assessed using 10-fold cross-validation repeated 10 times to quantify area under the curve (AUC) and Kappa score. CF models outperformed the other two algorithms based on AUC (0.86, CF; 0.81, RRF; 0.65, SVM) and Kappa score (0.53, CF; 0.41, RRF; 0.12, SVM). The taxa that were most informative for accurately predicting Salmonella contamination based on CF were compared to taxa identified by ALDEx2 as being differentially abundant between Salmonella-positive and -negative samples. CF and differential abundance tests both identified Aeromonas salmonicida (variable importance [VI] = 0.012) and Aeromonas sp. strain CA23 (VI = 0.025) as the two most informative taxa for predicting Salmonella contamination. Our findings suggest that microbiome-based models may provide an alternative to or complement existing water monitoring strategies. Similarly, the informative taxa identified in this study warrant further investigation as potential indicators of Salmonella contamination of agricultural water. IMPORTANCE Understanding the associations between surface water microbiome composition and the presence of foodborne pathogens, such as Salmonella, can facilitate the identification of novel indicators of Salmonella contamination. This study assessed the utility of microbiome data and three machine learning algorithms for predicting Salmonella contamination of Northeastern streams. The research reported here both expanded the knowledge on the microbiome composition of surface waters and identified putative novel indicators (i.e., Aeromonas species) for Salmonella in Northeastern streams. These putative indicators warrant further research to assess whether they are consistent indicators of Salmonella contamination across regions, waterways, and years not represented in the data set used in this study. Validated indicators identified using microbiome data may be used as targets in the development of rapid (e.g., PCR-based) detection assays for the assessment of microbial safety of agricultural surface waters. American Society for Microbiology 2023-03-22 /pmc/articles/PMC10100987/ /pubmed/36946722 http://dx.doi.org/10.1128/spectrum.00381-23 Text en Copyright © 2023 Chung et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Chung, Taejung
Yan, Runan
Weller, Daniel L.
Kovac, Jasna
Conditional Forest Models Built Using Metagenomic Data Accurately Predicted Salmonella Contamination in Northeastern Streams
title Conditional Forest Models Built Using Metagenomic Data Accurately Predicted Salmonella Contamination in Northeastern Streams
title_full Conditional Forest Models Built Using Metagenomic Data Accurately Predicted Salmonella Contamination in Northeastern Streams
title_fullStr Conditional Forest Models Built Using Metagenomic Data Accurately Predicted Salmonella Contamination in Northeastern Streams
title_full_unstemmed Conditional Forest Models Built Using Metagenomic Data Accurately Predicted Salmonella Contamination in Northeastern Streams
title_short Conditional Forest Models Built Using Metagenomic Data Accurately Predicted Salmonella Contamination in Northeastern Streams
title_sort conditional forest models built using metagenomic data accurately predicted salmonella contamination in northeastern streams
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10100987/
https://www.ncbi.nlm.nih.gov/pubmed/36946722
http://dx.doi.org/10.1128/spectrum.00381-23
work_keys_str_mv AT chungtaejung conditionalforestmodelsbuiltusingmetagenomicdataaccuratelypredictedsalmonellacontaminationinnortheasternstreams
AT yanrunan conditionalforestmodelsbuiltusingmetagenomicdataaccuratelypredictedsalmonellacontaminationinnortheasternstreams
AT wellerdaniell conditionalforestmodelsbuiltusingmetagenomicdataaccuratelypredictedsalmonellacontaminationinnortheasternstreams
AT kovacjasna conditionalforestmodelsbuiltusingmetagenomicdataaccuratelypredictedsalmonellacontaminationinnortheasternstreams