Cargando…

Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection

BACKGROUND: Several studies demonstrated the feasibility of predicting bacterial antibiotic resistance phenotypes from whole-genome sequences, the prediction process usually amounting to detecting the presence of genes involved in antibiotic resistance mechanisms, or of specific mutations, previousl...

Descripción completa

Detalles Bibliográficos
Autores principales: Mahé, Pierre, Tournoud, Maud
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6192184/
https://www.ncbi.nlm.nih.gov/pubmed/30332990
http://dx.doi.org/10.1186/s12859-018-2403-z
_version_ 1783363860910047232
author Mahé, Pierre
Tournoud, Maud
author_facet Mahé, Pierre
Tournoud, Maud
author_sort Mahé, Pierre
collection PubMed
description BACKGROUND: Several studies demonstrated the feasibility of predicting bacterial antibiotic resistance phenotypes from whole-genome sequences, the prediction process usually amounting to detecting the presence of genes involved in antibiotic resistance mechanisms, or of specific mutations, previously identified from a training panel of strains, within these genes. We address the problem from the supervised statistical learning perspective, not relying on prior information about such resistance factors. We rely on a k-mer based genotyping scheme and a logistic regression model, thereby combining several k-mers into a probabilistic model. To identify a small yet predictive set of k-mers, we rely on the stability selection approach (Meinshausen et al., J R Stat Soc Ser B 72:417–73, 2010), that consists in penalizing logistic regression models with a Lasso penalty, coupled with extensive resampling procedures. RESULTS: Using public datasets, we applied the resulting classifiers to two bacterial species and achieved predictive performance equivalent to state of the art. The models are extremely sparse, involving 1 to 8 k-mers per antibiotic, hence are remarkably easy and fast to evaluate on new genomes (from raw reads to assemblies). CONCLUSION: Our proof of concept therefore demonstrates that stability selection is a powerful approach to investigate bacterial genotype-phenotype relationships. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2403-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6192184
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61921842018-10-22 Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection Mahé, Pierre Tournoud, Maud BMC Bioinformatics Methodology Article BACKGROUND: Several studies demonstrated the feasibility of predicting bacterial antibiotic resistance phenotypes from whole-genome sequences, the prediction process usually amounting to detecting the presence of genes involved in antibiotic resistance mechanisms, or of specific mutations, previously identified from a training panel of strains, within these genes. We address the problem from the supervised statistical learning perspective, not relying on prior information about such resistance factors. We rely on a k-mer based genotyping scheme and a logistic regression model, thereby combining several k-mers into a probabilistic model. To identify a small yet predictive set of k-mers, we rely on the stability selection approach (Meinshausen et al., J R Stat Soc Ser B 72:417–73, 2010), that consists in penalizing logistic regression models with a Lasso penalty, coupled with extensive resampling procedures. RESULTS: Using public datasets, we applied the resulting classifiers to two bacterial species and achieved predictive performance equivalent to state of the art. The models are extremely sparse, involving 1 to 8 k-mers per antibiotic, hence are remarkably easy and fast to evaluate on new genomes (from raw reads to assemblies). CONCLUSION: Our proof of concept therefore demonstrates that stability selection is a powerful approach to investigate bacterial genotype-phenotype relationships. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2403-z) contains supplementary material, which is available to authorized users. BioMed Central 2018-10-17 /pmc/articles/PMC6192184/ /pubmed/30332990 http://dx.doi.org/10.1186/s12859-018-2403-z Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Mahé, Pierre
Tournoud, Maud
Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection
title Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection
title_full Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection
title_fullStr Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection
title_full_unstemmed Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection
title_short Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection
title_sort predicting bacterial resistance from whole-genome sequences using k-mers and stability selection
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6192184/
https://www.ncbi.nlm.nih.gov/pubmed/30332990
http://dx.doi.org/10.1186/s12859-018-2403-z
work_keys_str_mv AT mahepierre predictingbacterialresistancefromwholegenomesequencesusingkmersandstabilityselection
AT tournoudmaud predictingbacterialresistancefromwholegenomesequencesusingkmersandstabilityselection