Cargando…

Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons

BACKGROUND: The identification of genomic biomarkers is a key step towards improving diagnostic tests and therapies. We present a reference-free method for this task that relies on a k-mer representation of genomes and a machine learning algorithm that produces intelligible models. The method is com...

Descripción completa

Detalles Bibliográficos
Autores principales: Drouin, Alexandre, Giguère, Sébastien, Déraspe, Maxime, Marchand, Mario, Tyers, Michael, Loo, Vivian G., Bourgault, Anne-Marie, Laviolette, François, Corbeil, Jacques
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5037627/
https://www.ncbi.nlm.nih.gov/pubmed/27671088
http://dx.doi.org/10.1186/s12864-016-2889-6
_version_ 1782455777712144384
author Drouin, Alexandre
Giguère, Sébastien
Déraspe, Maxime
Marchand, Mario
Tyers, Michael
Loo, Vivian G.
Bourgault, Anne-Marie
Laviolette, François
Corbeil, Jacques
author_facet Drouin, Alexandre
Giguère, Sébastien
Déraspe, Maxime
Marchand, Mario
Tyers, Michael
Loo, Vivian G.
Bourgault, Anne-Marie
Laviolette, François
Corbeil, Jacques
author_sort Drouin, Alexandre
collection PubMed
description BACKGROUND: The identification of genomic biomarkers is a key step towards improving diagnostic tests and therapies. We present a reference-free method for this task that relies on a k-mer representation of genomes and a machine learning algorithm that produces intelligible models. The method is computationally scalable and well-suited for whole genome sequencing studies. RESULTS: The method was validated by generating models that predict the antibiotic resistance of C. difficile, M. tuberculosis, P. aeruginosa, and S. pneumoniae for 17 antibiotics. The obtained models are accurate, faithful to the biological pathways targeted by the antibiotics, and they provide insight into the process of resistance acquisition. Moreover, a theoretical analysis of the method revealed tight statistical guarantees on the accuracy of the obtained models, supporting its relevance for genomic biomarker discovery. CONCLUSIONS: Our method allows the generation of accurate and interpretable predictive models of phenotypes, which rely on a small set of genomic variations. The method is not limited to predicting antibiotic resistance in bacteria and is applicable to a variety of organisms and phenotypes. Kover, an efficient implementation of our method, is open-source and should guide biological efforts to understand a plethora of phenotypes (http://github.com/aldro61/kover/). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2889-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5037627
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50376272016-10-05 Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons Drouin, Alexandre Giguère, Sébastien Déraspe, Maxime Marchand, Mario Tyers, Michael Loo, Vivian G. Bourgault, Anne-Marie Laviolette, François Corbeil, Jacques BMC Genomics Methodology Article BACKGROUND: The identification of genomic biomarkers is a key step towards improving diagnostic tests and therapies. We present a reference-free method for this task that relies on a k-mer representation of genomes and a machine learning algorithm that produces intelligible models. The method is computationally scalable and well-suited for whole genome sequencing studies. RESULTS: The method was validated by generating models that predict the antibiotic resistance of C. difficile, M. tuberculosis, P. aeruginosa, and S. pneumoniae for 17 antibiotics. The obtained models are accurate, faithful to the biological pathways targeted by the antibiotics, and they provide insight into the process of resistance acquisition. Moreover, a theoretical analysis of the method revealed tight statistical guarantees on the accuracy of the obtained models, supporting its relevance for genomic biomarker discovery. CONCLUSIONS: Our method allows the generation of accurate and interpretable predictive models of phenotypes, which rely on a small set of genomic variations. The method is not limited to predicting antibiotic resistance in bacteria and is applicable to a variety of organisms and phenotypes. Kover, an efficient implementation of our method, is open-source and should guide biological efforts to understand a plethora of phenotypes (http://github.com/aldro61/kover/). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2889-6) contains supplementary material, which is available to authorized users. BioMed Central 2016-09-26 /pmc/articles/PMC5037627/ /pubmed/27671088 http://dx.doi.org/10.1186/s12864-016-2889-6 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Drouin, Alexandre
Giguère, Sébastien
Déraspe, Maxime
Marchand, Mario
Tyers, Michael
Loo, Vivian G.
Bourgault, Anne-Marie
Laviolette, François
Corbeil, Jacques
Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons
title Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons
title_full Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons
title_fullStr Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons
title_full_unstemmed Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons
title_short Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons
title_sort predictive computational phenotyping and biomarker discovery using reference-free genome comparisons
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5037627/
https://www.ncbi.nlm.nih.gov/pubmed/27671088
http://dx.doi.org/10.1186/s12864-016-2889-6
work_keys_str_mv AT drouinalexandre predictivecomputationalphenotypingandbiomarkerdiscoveryusingreferencefreegenomecomparisons
AT gigueresebastien predictivecomputationalphenotypingandbiomarkerdiscoveryusingreferencefreegenomecomparisons
AT deraspemaxime predictivecomputationalphenotypingandbiomarkerdiscoveryusingreferencefreegenomecomparisons
AT marchandmario predictivecomputationalphenotypingandbiomarkerdiscoveryusingreferencefreegenomecomparisons
AT tyersmichael predictivecomputationalphenotypingandbiomarkerdiscoveryusingreferencefreegenomecomparisons
AT looviviang predictivecomputationalphenotypingandbiomarkerdiscoveryusingreferencefreegenomecomparisons
AT bourgaultannemarie predictivecomputationalphenotypingandbiomarkerdiscoveryusingreferencefreegenomecomparisons
AT laviolettefrancois predictivecomputationalphenotypingandbiomarkerdiscoveryusingreferencefreegenomecomparisons
AT corbeiljacques predictivecomputationalphenotypingandbiomarkerdiscoveryusingreferencefreegenomecomparisons