Cargando…

Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data

Background: Tuberculosis disease, caused by Mycobacterium tuberculosis, is a major public health problem. The emergence of M. tuberculosis strains resistant to existing treatments threatens to derail control efforts. Resistance is mainly conferred by mutations in genes coding for drug targets or con...

Descripción completa

Detalles Bibliográficos
Autores principales:	Deelder, Wouter, Christakoudi, Sofia, Phelan, Jody, Benavente, Ernest Diez, Campino, Susana, McNerney, Ruth, Palla, Luigi, Clark, Taane G.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2019
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6775242/ https://www.ncbi.nlm.nih.gov/pubmed/31616478 http://dx.doi.org/10.3389/fgene.2019.00922

_version_	1783456199544406016
author	Deelder, Wouter Christakoudi, Sofia Phelan, Jody Benavente, Ernest Diez Campino, Susana McNerney, Ruth Palla, Luigi Clark, Taane G.
author_facet	Deelder, Wouter Christakoudi, Sofia Phelan, Jody Benavente, Ernest Diez Campino, Susana McNerney, Ruth Palla, Luigi Clark, Taane G.
author_sort	Deelder, Wouter
collection	PubMed
description	Background: Tuberculosis disease, caused by Mycobacterium tuberculosis, is a major public health problem. The emergence of M. tuberculosis strains resistant to existing treatments threatens to derail control efforts. Resistance is mainly conferred by mutations in genes coding for drug targets or converting enzymes, but our knowledge of these mutations is incomplete. Whole genome sequencing (WGS) is an increasingly common approach to rapidly characterize isolates and identify mutations predicting antimicrobial resistance and thereby providing a diagnostic tool to assist clinical decision making. Methods: We applied machine learning approaches to 16,688 M. tuberculosis isolates that have undergone WGS and laboratory drug-susceptibility testing (DST) across 14 antituberculosis drugs, with 22.5% of samples being multidrug resistant and 2.1% being extensively drug resistant. We used non-parametric classification-tree and gradient-boosted-tree models to predict drug resistance and uncover any associated novel putative mutations. We fitted separate models for each drug, with and without “co-occurrent resistance” markers known to be causing resistance to drugs other than the one of interest. Predictive performance was measured using sensitivity, specificity, and the area under the receiver operating characteristic curve, assuming DST results as the gold standard. Results: The predictive performance was highest for resistance to first-line drugs, amikacin, kanamycin, ciprofloxacin, moxifloxacin, and multidrug-resistant tuberculosis (area under the receiver operating characteristic curve above 96%), and lowest for third-line drugs such as D-cycloserine and Para-aminosalisylic acid (area under the curve below 85%). The inclusion of co-occurrent resistance markers led to improved performance for some drugs and superior results when compared to similar models in other large-scale studies, which had smaller sample sizes. Overall, the gradient-boosted-tree models performed better than the classification-tree models. The mutation-rank analysis detected no new single nucleotide polymorphisms linked to drug resistance. Discordance between DST and genotypically inferred resistance may be explained by DST errors, novel rare mutations, hetero-resistance, and nongenomic drivers such as efflux-pump upregulation. Conclusion: Our work demonstrates the utility of machine learning as a flexible approach to drug resistance prediction that is able to accommodate a much larger number of predictors and to summarize their predictive ability, thus assisting clinical decision making and single nucleotide polymorphism detection in an era of increasing WGS data generation.
format	Online Article Text
id	pubmed-6775242
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-67752422019-10-15 Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data Deelder, Wouter Christakoudi, Sofia Phelan, Jody Benavente, Ernest Diez Campino, Susana McNerney, Ruth Palla, Luigi Clark, Taane G. Front Genet Genetics Background: Tuberculosis disease, caused by Mycobacterium tuberculosis, is a major public health problem. The emergence of M. tuberculosis strains resistant to existing treatments threatens to derail control efforts. Resistance is mainly conferred by mutations in genes coding for drug targets or converting enzymes, but our knowledge of these mutations is incomplete. Whole genome sequencing (WGS) is an increasingly common approach to rapidly characterize isolates and identify mutations predicting antimicrobial resistance and thereby providing a diagnostic tool to assist clinical decision making. Methods: We applied machine learning approaches to 16,688 M. tuberculosis isolates that have undergone WGS and laboratory drug-susceptibility testing (DST) across 14 antituberculosis drugs, with 22.5% of samples being multidrug resistant and 2.1% being extensively drug resistant. We used non-parametric classification-tree and gradient-boosted-tree models to predict drug resistance and uncover any associated novel putative mutations. We fitted separate models for each drug, with and without “co-occurrent resistance” markers known to be causing resistance to drugs other than the one of interest. Predictive performance was measured using sensitivity, specificity, and the area under the receiver operating characteristic curve, assuming DST results as the gold standard. Results: The predictive performance was highest for resistance to first-line drugs, amikacin, kanamycin, ciprofloxacin, moxifloxacin, and multidrug-resistant tuberculosis (area under the receiver operating characteristic curve above 96%), and lowest for third-line drugs such as D-cycloserine and Para-aminosalisylic acid (area under the curve below 85%). The inclusion of co-occurrent resistance markers led to improved performance for some drugs and superior results when compared to similar models in other large-scale studies, which had smaller sample sizes. Overall, the gradient-boosted-tree models performed better than the classification-tree models. The mutation-rank analysis detected no new single nucleotide polymorphisms linked to drug resistance. Discordance between DST and genotypically inferred resistance may be explained by DST errors, novel rare mutations, hetero-resistance, and nongenomic drivers such as efflux-pump upregulation. Conclusion: Our work demonstrates the utility of machine learning as a flexible approach to drug resistance prediction that is able to accommodate a much larger number of predictors and to summarize their predictive ability, thus assisting clinical decision making and single nucleotide polymorphism detection in an era of increasing WGS data generation. Frontiers Media S.A. 2019-09-26 /pmc/articles/PMC6775242/ /pubmed/31616478 http://dx.doi.org/10.3389/fgene.2019.00922 Text en Copyright © 2019 Deelder, Christakoudi, Phelan, Benavente, Campino, McNerney, Palla and Clark http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Deelder, Wouter Christakoudi, Sofia Phelan, Jody Benavente, Ernest Diez Campino, Susana McNerney, Ruth Palla, Luigi Clark, Taane G. Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data
title	Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data
title_full	Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data
title_fullStr	Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data
title_full_unstemmed	Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data
title_short	Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data
title_sort	machine learning predicts accurately mycobacterium tuberculosis drug resistance from whole genome sequencing data
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6775242/ https://www.ncbi.nlm.nih.gov/pubmed/31616478 http://dx.doi.org/10.3389/fgene.2019.00922
work_keys_str_mv	AT deelderwouter machinelearningpredictsaccuratelymycobacteriumtuberculosisdrugresistancefromwholegenomesequencingdata AT christakoudisofia machinelearningpredictsaccuratelymycobacteriumtuberculosisdrugresistancefromwholegenomesequencingdata AT phelanjody machinelearningpredictsaccuratelymycobacteriumtuberculosisdrugresistancefromwholegenomesequencingdata AT benaventeernestdiez machinelearningpredictsaccuratelymycobacteriumtuberculosisdrugresistancefromwholegenomesequencingdata AT campinosusana machinelearningpredictsaccuratelymycobacteriumtuberculosisdrugresistancefromwholegenomesequencingdata AT mcnerneyruth machinelearningpredictsaccuratelymycobacteriumtuberculosisdrugresistancefromwholegenomesequencingdata AT pallaluigi machinelearningpredictsaccuratelymycobacteriumtuberculosisdrugresistancefromwholegenomesequencingdata AT clarktaaneg machinelearningpredictsaccuratelymycobacteriumtuberculosisdrugresistancefromwholegenomesequencingdata

Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data

Ejemplares similares