Cargando…

Machine Learning Methods as a Tool for Predicting Risk of Illness Applying Next‐Generation Sequencing Data

Next‐generation sequencing (NGS) data present an untapped potential to improve microbial risk assessment (MRA) through increased specificity and redefinition of the hazard. Most of the MRA models do not account for differences in survivability and virulence among strains. The potential of machine le...

Descripción completa

Detalles Bibliográficos
Autores principales: Njage, Patrick Murigu Kamau, Henri, Clementine, Leekitcharoenphon, Pimlapas, Mistou, Michel‐Yves, Hendriksen, Rene S., Hald, Tine
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7379936/
https://www.ncbi.nlm.nih.gov/pubmed/30462833
http://dx.doi.org/10.1111/risa.13239
_version_ 1783562754393636864
author Njage, Patrick Murigu Kamau
Henri, Clementine
Leekitcharoenphon, Pimlapas
Mistou, Michel‐Yves
Hendriksen, Rene S.
Hald, Tine
author_facet Njage, Patrick Murigu Kamau
Henri, Clementine
Leekitcharoenphon, Pimlapas
Mistou, Michel‐Yves
Hendriksen, Rene S.
Hald, Tine
author_sort Njage, Patrick Murigu Kamau
collection PubMed
description Next‐generation sequencing (NGS) data present an untapped potential to improve microbial risk assessment (MRA) through increased specificity and redefinition of the hazard. Most of the MRA models do not account for differences in survivability and virulence among strains. The potential of machine learning algorithms for predicting the risk/health burden at the population level while inputting large and complex NGS data was explored with Listeria monocytogenes as a case study. Listeria data consisted of a percentage similarity matrix from genome assemblies of 38 and 207 strains of clinical and food origin, respectively. Basic Local Alignment (BLAST) was used to align the assemblies against a database of 136 virulence and stress resistance genes. The outcome variable was frequency of illness, which is the percentage of reported cases associated with each strain. These frequency data were discretized into seven ordinal outcome categories and used for supervised machine learning and model selection from five ensemble algorithms. There was no significant difference in accuracy between the models, and support vector machine with linear kernel was chosen for further inference (accuracy of 89% [95% CI: 68%, 97%]). The virulence genes FAM002725, FAM002728, FAM002729, InlF, InlJ, Inlk, IisY, IisD, IisX, IisH, IisB, lmo2026, and FAM003296 were important predictors of higher frequency of illness. InlF was uniquely truncated in the sequence type 121 strains. Most important risk predictor genes occurred at highest prevalence among strains from ready‐to‐eat, dairy, and composite foods. We foresee that the findings and approaches described offer the potential for rethinking the current approaches in MRA.
format Online
Article
Text
id pubmed-7379936
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-73799362020-07-27 Machine Learning Methods as a Tool for Predicting Risk of Illness Applying Next‐Generation Sequencing Data Njage, Patrick Murigu Kamau Henri, Clementine Leekitcharoenphon, Pimlapas Mistou, Michel‐Yves Hendriksen, Rene S. Hald, Tine Risk Anal Original Research Articles Next‐generation sequencing (NGS) data present an untapped potential to improve microbial risk assessment (MRA) through increased specificity and redefinition of the hazard. Most of the MRA models do not account for differences in survivability and virulence among strains. The potential of machine learning algorithms for predicting the risk/health burden at the population level while inputting large and complex NGS data was explored with Listeria monocytogenes as a case study. Listeria data consisted of a percentage similarity matrix from genome assemblies of 38 and 207 strains of clinical and food origin, respectively. Basic Local Alignment (BLAST) was used to align the assemblies against a database of 136 virulence and stress resistance genes. The outcome variable was frequency of illness, which is the percentage of reported cases associated with each strain. These frequency data were discretized into seven ordinal outcome categories and used for supervised machine learning and model selection from five ensemble algorithms. There was no significant difference in accuracy between the models, and support vector machine with linear kernel was chosen for further inference (accuracy of 89% [95% CI: 68%, 97%]). The virulence genes FAM002725, FAM002728, FAM002729, InlF, InlJ, Inlk, IisY, IisD, IisX, IisH, IisB, lmo2026, and FAM003296 were important predictors of higher frequency of illness. InlF was uniquely truncated in the sequence type 121 strains. Most important risk predictor genes occurred at highest prevalence among strains from ready‐to‐eat, dairy, and composite foods. We foresee that the findings and approaches described offer the potential for rethinking the current approaches in MRA. John Wiley and Sons Inc. 2018-11-21 2019-06 /pmc/articles/PMC7379936/ /pubmed/30462833 http://dx.doi.org/10.1111/risa.13239 Text en © 2018 The Authors Risk Analysis published by Wiley Periodicals, Inc. on behalf of Society for Risk Analysis. This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
spellingShingle Original Research Articles
Njage, Patrick Murigu Kamau
Henri, Clementine
Leekitcharoenphon, Pimlapas
Mistou, Michel‐Yves
Hendriksen, Rene S.
Hald, Tine
Machine Learning Methods as a Tool for Predicting Risk of Illness Applying Next‐Generation Sequencing Data
title Machine Learning Methods as a Tool for Predicting Risk of Illness Applying Next‐Generation Sequencing Data
title_full Machine Learning Methods as a Tool for Predicting Risk of Illness Applying Next‐Generation Sequencing Data
title_fullStr Machine Learning Methods as a Tool for Predicting Risk of Illness Applying Next‐Generation Sequencing Data
title_full_unstemmed Machine Learning Methods as a Tool for Predicting Risk of Illness Applying Next‐Generation Sequencing Data
title_short Machine Learning Methods as a Tool for Predicting Risk of Illness Applying Next‐Generation Sequencing Data
title_sort machine learning methods as a tool for predicting risk of illness applying next‐generation sequencing data
topic Original Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7379936/
https://www.ncbi.nlm.nih.gov/pubmed/30462833
http://dx.doi.org/10.1111/risa.13239
work_keys_str_mv AT njagepatrickmurigukamau machinelearningmethodsasatoolforpredictingriskofillnessapplyingnextgenerationsequencingdata
AT henriclementine machinelearningmethodsasatoolforpredictingriskofillnessapplyingnextgenerationsequencingdata
AT leekitcharoenphonpimlapas machinelearningmethodsasatoolforpredictingriskofillnessapplyingnextgenerationsequencingdata
AT mistoumichelyves machinelearningmethodsasatoolforpredictingriskofillnessapplyingnextgenerationsequencingdata
AT hendriksenrenes machinelearningmethodsasatoolforpredictingriskofillnessapplyingnextgenerationsequencingdata
AT haldtine machinelearningmethodsasatoolforpredictingriskofillnessapplyingnextgenerationsequencingdata