Cargando…
Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection
BACKGROUND: Several studies demonstrated the feasibility of predicting bacterial antibiotic resistance phenotypes from whole-genome sequences, the prediction process usually amounting to detecting the presence of genes involved in antibiotic resistance mechanisms, or of specific mutations, previousl...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6192184/ https://www.ncbi.nlm.nih.gov/pubmed/30332990 http://dx.doi.org/10.1186/s12859-018-2403-z |
_version_ | 1783363860910047232 |
---|---|
author | Mahé, Pierre Tournoud, Maud |
author_facet | Mahé, Pierre Tournoud, Maud |
author_sort | Mahé, Pierre |
collection | PubMed |
description | BACKGROUND: Several studies demonstrated the feasibility of predicting bacterial antibiotic resistance phenotypes from whole-genome sequences, the prediction process usually amounting to detecting the presence of genes involved in antibiotic resistance mechanisms, or of specific mutations, previously identified from a training panel of strains, within these genes. We address the problem from the supervised statistical learning perspective, not relying on prior information about such resistance factors. We rely on a k-mer based genotyping scheme and a logistic regression model, thereby combining several k-mers into a probabilistic model. To identify a small yet predictive set of k-mers, we rely on the stability selection approach (Meinshausen et al., J R Stat Soc Ser B 72:417–73, 2010), that consists in penalizing logistic regression models with a Lasso penalty, coupled with extensive resampling procedures. RESULTS: Using public datasets, we applied the resulting classifiers to two bacterial species and achieved predictive performance equivalent to state of the art. The models are extremely sparse, involving 1 to 8 k-mers per antibiotic, hence are remarkably easy and fast to evaluate on new genomes (from raw reads to assemblies). CONCLUSION: Our proof of concept therefore demonstrates that stability selection is a powerful approach to investigate bacterial genotype-phenotype relationships. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2403-z) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6192184 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-61921842018-10-22 Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection Mahé, Pierre Tournoud, Maud BMC Bioinformatics Methodology Article BACKGROUND: Several studies demonstrated the feasibility of predicting bacterial antibiotic resistance phenotypes from whole-genome sequences, the prediction process usually amounting to detecting the presence of genes involved in antibiotic resistance mechanisms, or of specific mutations, previously identified from a training panel of strains, within these genes. We address the problem from the supervised statistical learning perspective, not relying on prior information about such resistance factors. We rely on a k-mer based genotyping scheme and a logistic regression model, thereby combining several k-mers into a probabilistic model. To identify a small yet predictive set of k-mers, we rely on the stability selection approach (Meinshausen et al., J R Stat Soc Ser B 72:417–73, 2010), that consists in penalizing logistic regression models with a Lasso penalty, coupled with extensive resampling procedures. RESULTS: Using public datasets, we applied the resulting classifiers to two bacterial species and achieved predictive performance equivalent to state of the art. The models are extremely sparse, involving 1 to 8 k-mers per antibiotic, hence are remarkably easy and fast to evaluate on new genomes (from raw reads to assemblies). CONCLUSION: Our proof of concept therefore demonstrates that stability selection is a powerful approach to investigate bacterial genotype-phenotype relationships. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2403-z) contains supplementary material, which is available to authorized users. BioMed Central 2018-10-17 /pmc/articles/PMC6192184/ /pubmed/30332990 http://dx.doi.org/10.1186/s12859-018-2403-z Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Mahé, Pierre Tournoud, Maud Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection |
title | Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection |
title_full | Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection |
title_fullStr | Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection |
title_full_unstemmed | Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection |
title_short | Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection |
title_sort | predicting bacterial resistance from whole-genome sequences using k-mers and stability selection |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6192184/ https://www.ncbi.nlm.nih.gov/pubmed/30332990 http://dx.doi.org/10.1186/s12859-018-2403-z |
work_keys_str_mv | AT mahepierre predictingbacterialresistancefromwholegenomesequencesusingkmersandstabilityselection AT tournoudmaud predictingbacterialresistancefromwholegenomesequencesusingkmersandstabilityselection |