Cargando…

Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions

Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genom...

Descripción completa

Detalles Bibliográficos
Autores principales: Lees, John A., Mai, T. Tien, Galardini, Marco, Wheeler, Nicole E., Horsfield, Samuel T., Parkhill, Julian, Corander, Jukka
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7343994/
https://www.ncbi.nlm.nih.gov/pubmed/32636251
http://dx.doi.org/10.1128/mBio.01344-20
_version_ 1783555867454472192
author Lees, John A.
Mai, T. Tien
Galardini, Marco
Wheeler, Nicole E.
Horsfield, Samuel T.
Parkhill, Julian
Corander, Jukka
author_facet Lees, John A.
Mai, T. Tien
Galardini, Marco
Wheeler, Nicole E.
Horsfield, Samuel T.
Parkhill, Julian
Corander, Jukka
author_sort Lees, John A.
collection PubMed
description Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genomes and the clonal structure of bacterial populations creates challenges. We introduce an alignment-free method which finds sets of loci associated with bacterial phenotypes, quantifies the total effect of genetics on the phenotype, and allows accurate phenotype prediction, all within a single computationally scalable joint modeling framework. Genetic variants covering the entire pangenome are compactly represented by extended DNA sequence words known as unitigs, and model fitting is achieved using elastic net penalization, an extension of standard multiple regression. Using an extensive set of state-of-the-art bacterial population genomic data sets, we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. Compared to those of previous approaches, which test each genotype-phenotype association separately for each variant and apply a significance threshold, the variants selected by our joint modeling approach overlap substantially.
format Online
Article
Text
id pubmed-7343994
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-73439942020-07-10 Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions Lees, John A. Mai, T. Tien Galardini, Marco Wheeler, Nicole E. Horsfield, Samuel T. Parkhill, Julian Corander, Jukka mBio Research Article Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genomes and the clonal structure of bacterial populations creates challenges. We introduce an alignment-free method which finds sets of loci associated with bacterial phenotypes, quantifies the total effect of genetics on the phenotype, and allows accurate phenotype prediction, all within a single computationally scalable joint modeling framework. Genetic variants covering the entire pangenome are compactly represented by extended DNA sequence words known as unitigs, and model fitting is achieved using elastic net penalization, an extension of standard multiple regression. Using an extensive set of state-of-the-art bacterial population genomic data sets, we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. Compared to those of previous approaches, which test each genotype-phenotype association separately for each variant and apply a significance threshold, the variants selected by our joint modeling approach overlap substantially. American Society for Microbiology 2020-07-07 /pmc/articles/PMC7343994/ /pubmed/32636251 http://dx.doi.org/10.1128/mBio.01344-20 Text en Copyright © 2020 Lees et al. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Lees, John A.
Mai, T. Tien
Galardini, Marco
Wheeler, Nicole E.
Horsfield, Samuel T.
Parkhill, Julian
Corander, Jukka
Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions
title Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions
title_full Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions
title_fullStr Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions
title_full_unstemmed Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions
title_short Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions
title_sort improved prediction of bacterial genotype-phenotype associations using interpretable pangenome-spanning regressions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7343994/
https://www.ncbi.nlm.nih.gov/pubmed/32636251
http://dx.doi.org/10.1128/mBio.01344-20
work_keys_str_mv AT leesjohna improvedpredictionofbacterialgenotypephenotypeassociationsusinginterpretablepangenomespanningregressions
AT maittien improvedpredictionofbacterialgenotypephenotypeassociationsusinginterpretablepangenomespanningregressions
AT galardinimarco improvedpredictionofbacterialgenotypephenotypeassociationsusinginterpretablepangenomespanningregressions
AT wheelernicolee improvedpredictionofbacterialgenotypephenotypeassociationsusinginterpretablepangenomespanningregressions
AT horsfieldsamuelt improvedpredictionofbacterialgenotypephenotypeassociationsusinginterpretablepangenomespanningregressions
AT parkhilljulian improvedpredictionofbacterialgenotypephenotypeassociationsusinginterpretablepangenomespanningregressions
AT coranderjukka improvedpredictionofbacterialgenotypephenotypeassociationsusinginterpretablepangenomespanningregressions