Cargando…
Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease
Highly dimensional data generated from bacterial whole-genome sequencing is providing an unprecedented scale of information that requires an appropriate statistical analysis framework to infer biological function from populations of genomes. The application of genome-wide association study (GWAS) me...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7232492/ https://www.ncbi.nlm.nih.gov/pubmed/32290186 http://dx.doi.org/10.3390/microorganisms8040549 |
_version_ | 1783535399974469632 |
---|---|
author | Bandoy, DJ Darwin R. Weimer, Bart C. |
author_facet | Bandoy, DJ Darwin R. Weimer, Bart C. |
author_sort | Bandoy, DJ Darwin R. |
collection | PubMed |
description | Highly dimensional data generated from bacterial whole-genome sequencing is providing an unprecedented scale of information that requires an appropriate statistical analysis framework to infer biological function from populations of genomes. The application of genome-wide association study (GWAS) methods is an appropriate framework for bacterial population genome analysis that yields a list of candidate genes associated with a phenotype, but it provides an unranked measure of importance. Here, we validated a novel framework to define infection mechanism using the combination of GWAS, machine learning, and bacterial population genomics that ranked allelic variants that accurately identified disease. This approach parsed a dataset of 1.2 million single nucleotide polymorphisms (SNPs) and indels that resulted in an importance ranked list of associated alleles of porA in Campylobacter jejuni using spatiotemporal analysis over 30 years. We validated this approach using previously proven laboratory experimental alleles from an in vivo guinea pig abortion model. This framework, termed μPathML, defined intestinal and extraintestinal groups that have differential allelic porA variants that cause abortion. Divergent variants containing indels that defeated automated annotation were rescued using biological context and knowledge that resulted in defining rare, divergent variants that were maintained in the population over two continents and 30 years. This study defines the capability of machine learning coupled with GWAS and population genomics to simultaneously identify and rank alleles to define their role in infectious disease mechanisms. |
format | Online Article Text |
id | pubmed-7232492 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-72324922020-05-22 Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease Bandoy, DJ Darwin R. Weimer, Bart C. Microorganisms Article Highly dimensional data generated from bacterial whole-genome sequencing is providing an unprecedented scale of information that requires an appropriate statistical analysis framework to infer biological function from populations of genomes. The application of genome-wide association study (GWAS) methods is an appropriate framework for bacterial population genome analysis that yields a list of candidate genes associated with a phenotype, but it provides an unranked measure of importance. Here, we validated a novel framework to define infection mechanism using the combination of GWAS, machine learning, and bacterial population genomics that ranked allelic variants that accurately identified disease. This approach parsed a dataset of 1.2 million single nucleotide polymorphisms (SNPs) and indels that resulted in an importance ranked list of associated alleles of porA in Campylobacter jejuni using spatiotemporal analysis over 30 years. We validated this approach using previously proven laboratory experimental alleles from an in vivo guinea pig abortion model. This framework, termed μPathML, defined intestinal and extraintestinal groups that have differential allelic porA variants that cause abortion. Divergent variants containing indels that defeated automated annotation were rescued using biological context and knowledge that resulted in defining rare, divergent variants that were maintained in the population over two continents and 30 years. This study defines the capability of machine learning coupled with GWAS and population genomics to simultaneously identify and rank alleles to define their role in infectious disease mechanisms. MDPI 2020-04-10 /pmc/articles/PMC7232492/ /pubmed/32290186 http://dx.doi.org/10.3390/microorganisms8040549 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Bandoy, DJ Darwin R. Weimer, Bart C. Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease |
title | Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease |
title_full | Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease |
title_fullStr | Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease |
title_full_unstemmed | Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease |
title_short | Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease |
title_sort | biological machine learning combined with campylobacter population genomics reveals virulence gene allelic variants cause disease |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7232492/ https://www.ncbi.nlm.nih.gov/pubmed/32290186 http://dx.doi.org/10.3390/microorganisms8040549 |
work_keys_str_mv | AT bandoydjdarwinr biologicalmachinelearningcombinedwithcampylobacterpopulationgenomicsrevealsvirulencegeneallelicvariantscausedisease AT weimerbartc biologicalmachinelearningcombinedwithcampylobacterpopulationgenomicsrevealsvirulencegeneallelicvariantscausedisease |