Cargando…

Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease

Highly dimensional data generated from bacterial whole-genome sequencing is providing an unprecedented scale of information that requires an appropriate statistical analysis framework to infer biological function from populations of genomes. The application of genome-wide association study (GWAS) me...

Descripción completa

Detalles Bibliográficos
Autores principales: Bandoy, DJ Darwin R., Weimer, Bart C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7232492/
https://www.ncbi.nlm.nih.gov/pubmed/32290186
http://dx.doi.org/10.3390/microorganisms8040549
_version_ 1783535399974469632
author Bandoy, DJ Darwin R.
Weimer, Bart C.
author_facet Bandoy, DJ Darwin R.
Weimer, Bart C.
author_sort Bandoy, DJ Darwin R.
collection PubMed
description Highly dimensional data generated from bacterial whole-genome sequencing is providing an unprecedented scale of information that requires an appropriate statistical analysis framework to infer biological function from populations of genomes. The application of genome-wide association study (GWAS) methods is an appropriate framework for bacterial population genome analysis that yields a list of candidate genes associated with a phenotype, but it provides an unranked measure of importance. Here, we validated a novel framework to define infection mechanism using the combination of GWAS, machine learning, and bacterial population genomics that ranked allelic variants that accurately identified disease. This approach parsed a dataset of 1.2 million single nucleotide polymorphisms (SNPs) and indels that resulted in an importance ranked list of associated alleles of porA in Campylobacter jejuni using spatiotemporal analysis over 30 years. We validated this approach using previously proven laboratory experimental alleles from an in vivo guinea pig abortion model. This framework, termed μPathML, defined intestinal and extraintestinal groups that have differential allelic porA variants that cause abortion. Divergent variants containing indels that defeated automated annotation were rescued using biological context and knowledge that resulted in defining rare, divergent variants that were maintained in the population over two continents and 30 years. This study defines the capability of machine learning coupled with GWAS and population genomics to simultaneously identify and rank alleles to define their role in infectious disease mechanisms.
format Online
Article
Text
id pubmed-7232492
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-72324922020-05-22 Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease Bandoy, DJ Darwin R. Weimer, Bart C. Microorganisms Article Highly dimensional data generated from bacterial whole-genome sequencing is providing an unprecedented scale of information that requires an appropriate statistical analysis framework to infer biological function from populations of genomes. The application of genome-wide association study (GWAS) methods is an appropriate framework for bacterial population genome analysis that yields a list of candidate genes associated with a phenotype, but it provides an unranked measure of importance. Here, we validated a novel framework to define infection mechanism using the combination of GWAS, machine learning, and bacterial population genomics that ranked allelic variants that accurately identified disease. This approach parsed a dataset of 1.2 million single nucleotide polymorphisms (SNPs) and indels that resulted in an importance ranked list of associated alleles of porA in Campylobacter jejuni using spatiotemporal analysis over 30 years. We validated this approach using previously proven laboratory experimental alleles from an in vivo guinea pig abortion model. This framework, termed μPathML, defined intestinal and extraintestinal groups that have differential allelic porA variants that cause abortion. Divergent variants containing indels that defeated automated annotation were rescued using biological context and knowledge that resulted in defining rare, divergent variants that were maintained in the population over two continents and 30 years. This study defines the capability of machine learning coupled with GWAS and population genomics to simultaneously identify and rank alleles to define their role in infectious disease mechanisms. MDPI 2020-04-10 /pmc/articles/PMC7232492/ /pubmed/32290186 http://dx.doi.org/10.3390/microorganisms8040549 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Bandoy, DJ Darwin R.
Weimer, Bart C.
Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease
title Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease
title_full Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease
title_fullStr Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease
title_full_unstemmed Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease
title_short Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease
title_sort biological machine learning combined with campylobacter population genomics reveals virulence gene allelic variants cause disease
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7232492/
https://www.ncbi.nlm.nih.gov/pubmed/32290186
http://dx.doi.org/10.3390/microorganisms8040549
work_keys_str_mv AT bandoydjdarwinr biologicalmachinelearningcombinedwithcampylobacterpopulationgenomicsrevealsvirulencegeneallelicvariantscausedisease
AT weimerbartc biologicalmachinelearningcombinedwithcampylobacterpopulationgenomicsrevealsvirulencegeneallelicvariantscausedisease