Cargando…
Lessons Learnt From Using the Machine Learning Random Forest Algorithm to Predict Virulence in Streptococcus pyogenes
Group A Streptococcus is a globally significant human pathogen. The extensive variability of the GAS genome, virulence phenotypes and clinical outcomes, render it an excellent candidate for the application of genotype-phenotype association studies in the era of whole-genome sequencing. We have catal...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8739889/ https://www.ncbi.nlm.nih.gov/pubmed/35004362 http://dx.doi.org/10.3389/fcimb.2021.809560 |
_version_ | 1784629197316030464 |
---|---|
author | Buckley, Sean J. Harvey, Robert J. |
author_facet | Buckley, Sean J. Harvey, Robert J. |
author_sort | Buckley, Sean J. |
collection | PubMed |
description | Group A Streptococcus is a globally significant human pathogen. The extensive variability of the GAS genome, virulence phenotypes and clinical outcomes, render it an excellent candidate for the application of genotype-phenotype association studies in the era of whole-genome sequencing. We have catalogued the distribution and diversity of the transcription regulators of GAS, and employed phylogenetics, concordance metrics and machine learning (ML) to test for associations. In this review, we communicate the lessons learnt in the context of the recent bacteria genotype-phenotype association studies of others that have utilised both genome-wide association studies (GWAS) and ML. We envisage a promising future for the application GWAS in bacteria genotype-phenotype association studies and foresee the increasing use of ML. However, progress in this field is hindered by several outstanding bottlenecks. These include the shortcomings that are observed when GWAS techniques that have been fine-tuned on human genomes, are applied to bacterial genomes. Furthermore, there is a deficit of easy-to-use end-to-end workflows, and a lag in the collection of detailed phenotype and clinical genomic metadata. We propose a novel quality control protocol for the collection of high-quality GAS virulence phenotype coupled to clinical outcome data. Finally, we incorporate this protocol into a workflow for testing genotype-phenotype associations using ML and ‘linked’ patient-microbe genome sets that better represent the infection event. |
format | Online Article Text |
id | pubmed-8739889 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-87398892022-01-08 Lessons Learnt From Using the Machine Learning Random Forest Algorithm to Predict Virulence in Streptococcus pyogenes Buckley, Sean J. Harvey, Robert J. Front Cell Infect Microbiol Cellular and Infection Microbiology Group A Streptococcus is a globally significant human pathogen. The extensive variability of the GAS genome, virulence phenotypes and clinical outcomes, render it an excellent candidate for the application of genotype-phenotype association studies in the era of whole-genome sequencing. We have catalogued the distribution and diversity of the transcription regulators of GAS, and employed phylogenetics, concordance metrics and machine learning (ML) to test for associations. In this review, we communicate the lessons learnt in the context of the recent bacteria genotype-phenotype association studies of others that have utilised both genome-wide association studies (GWAS) and ML. We envisage a promising future for the application GWAS in bacteria genotype-phenotype association studies and foresee the increasing use of ML. However, progress in this field is hindered by several outstanding bottlenecks. These include the shortcomings that are observed when GWAS techniques that have been fine-tuned on human genomes, are applied to bacterial genomes. Furthermore, there is a deficit of easy-to-use end-to-end workflows, and a lag in the collection of detailed phenotype and clinical genomic metadata. We propose a novel quality control protocol for the collection of high-quality GAS virulence phenotype coupled to clinical outcome data. Finally, we incorporate this protocol into a workflow for testing genotype-phenotype associations using ML and ‘linked’ patient-microbe genome sets that better represent the infection event. Frontiers Media S.A. 2021-12-24 /pmc/articles/PMC8739889/ /pubmed/35004362 http://dx.doi.org/10.3389/fcimb.2021.809560 Text en Copyright © 2021 Buckley and Harvey https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Cellular and Infection Microbiology Buckley, Sean J. Harvey, Robert J. Lessons Learnt From Using the Machine Learning Random Forest Algorithm to Predict Virulence in Streptococcus pyogenes |
title | Lessons Learnt From Using the Machine Learning Random Forest Algorithm to Predict Virulence in Streptococcus pyogenes
|
title_full | Lessons Learnt From Using the Machine Learning Random Forest Algorithm to Predict Virulence in Streptococcus pyogenes
|
title_fullStr | Lessons Learnt From Using the Machine Learning Random Forest Algorithm to Predict Virulence in Streptococcus pyogenes
|
title_full_unstemmed | Lessons Learnt From Using the Machine Learning Random Forest Algorithm to Predict Virulence in Streptococcus pyogenes
|
title_short | Lessons Learnt From Using the Machine Learning Random Forest Algorithm to Predict Virulence in Streptococcus pyogenes
|
title_sort | lessons learnt from using the machine learning random forest algorithm to predict virulence in streptococcus pyogenes |
topic | Cellular and Infection Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8739889/ https://www.ncbi.nlm.nih.gov/pubmed/35004362 http://dx.doi.org/10.3389/fcimb.2021.809560 |
work_keys_str_mv | AT buckleyseanj lessonslearntfromusingthemachinelearningrandomforestalgorithmtopredictvirulenceinstreptococcuspyogenes AT harveyrobertj lessonslearntfromusingthemachinelearningrandomforestalgorithmtopredictvirulenceinstreptococcuspyogenes |