Cargando…

Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction

BACKGROUND: The diagnosis of multidrug resistant and extensively drug resistant tuberculosis is a global health priority. Whole genome sequencing of clinical Mycobacterium tuberculosis isolates promises to circumvent the long wait times and limited scope of conventional phenotypic antimicrobial susc...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Michael L., Doddi, Akshith, Royer, Jimmy, Freschi, Luca, Schito, Marco, Ezewudo, Matthew, Kohane, Isaac S., Beam, Andrew, Farhat, Maha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6557804/
https://www.ncbi.nlm.nih.gov/pubmed/31047860
http://dx.doi.org/10.1016/j.ebiom.2019.04.016
_version_ 1783425497184600064
author Chen, Michael L.
Doddi, Akshith
Royer, Jimmy
Freschi, Luca
Schito, Marco
Ezewudo, Matthew
Kohane, Isaac S.
Beam, Andrew
Farhat, Maha
author_facet Chen, Michael L.
Doddi, Akshith
Royer, Jimmy
Freschi, Luca
Schito, Marco
Ezewudo, Matthew
Kohane, Isaac S.
Beam, Andrew
Farhat, Maha
author_sort Chen, Michael L.
collection PubMed
description BACKGROUND: The diagnosis of multidrug resistant and extensively drug resistant tuberculosis is a global health priority. Whole genome sequencing of clinical Mycobacterium tuberculosis isolates promises to circumvent the long wait times and limited scope of conventional phenotypic antimicrobial susceptibility, but gaps remain for predicting phenotype accurately from genotypic data especially for certain drugs. Our primary aim was to perform an exploration of statistical learning algorithms and genetic predictor sets using a rich dataset to build a high performing and fast predicting model to detect anti-tuberculosis drug resistance. METHODS: We collected targeted or whole genome sequencing and conventional drug resistance phenotyping data from 3601 Mycobacterium tuberculosis strains enriched for resistance to first- and second-line drugs, with 1228 multidrug resistant strains. We investigated the utility of (1) rare variants and variants known to be determinants of resistance for at least one drug and (2) machine and statistical learning architectures in predicting phenotypic drug resistance to 10 anti-tuberculosis drugs. Specifically, we investigated multitask and single task wide and deep neural networks, a multilayer perceptron, regularized logistic regression, and random forest classifiers. FINDINGS: The highest performing machine and statistical learning methods included both rare variants and those known to be causal of resistance for at least one drug. Both simpler L2 penalized regression and complex machine learning models had high predictive performance. The average AUCs for our highest performing model was 0.979 for first-line drugs and 0.936 for second-line drugs during repeated cross-validation. On an independent validation set, the highest performing model showed average AUCs, sensitivities, and specificities, respectively, of 0.937, 87.9%, and 92.7% for first-line drugs and 0.891, 82.0% and 90.1% for second-line drugs. Our method outperforms existing approaches based on direct association, with increased sum of sensitivity and specificity of 11.7% on first line drugs and 3.2% on second line drugs. Our method has higher predictive performance compared to previously reported machine learning models during cross-validation, with higher AUCs for 8 of 10 drugs. INTERPRETATION: Statistical models, especially those that are trained using both frequent and less frequent variants, significantly improve the accuracy of resistance prediction and hold promise in bringing sequencing technologies closer to the bedside.
format Online
Article
Text
id pubmed-6557804
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-65578042019-06-14 Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction Chen, Michael L. Doddi, Akshith Royer, Jimmy Freschi, Luca Schito, Marco Ezewudo, Matthew Kohane, Isaac S. Beam, Andrew Farhat, Maha EBioMedicine Research paper BACKGROUND: The diagnosis of multidrug resistant and extensively drug resistant tuberculosis is a global health priority. Whole genome sequencing of clinical Mycobacterium tuberculosis isolates promises to circumvent the long wait times and limited scope of conventional phenotypic antimicrobial susceptibility, but gaps remain for predicting phenotype accurately from genotypic data especially for certain drugs. Our primary aim was to perform an exploration of statistical learning algorithms and genetic predictor sets using a rich dataset to build a high performing and fast predicting model to detect anti-tuberculosis drug resistance. METHODS: We collected targeted or whole genome sequencing and conventional drug resistance phenotyping data from 3601 Mycobacterium tuberculosis strains enriched for resistance to first- and second-line drugs, with 1228 multidrug resistant strains. We investigated the utility of (1) rare variants and variants known to be determinants of resistance for at least one drug and (2) machine and statistical learning architectures in predicting phenotypic drug resistance to 10 anti-tuberculosis drugs. Specifically, we investigated multitask and single task wide and deep neural networks, a multilayer perceptron, regularized logistic regression, and random forest classifiers. FINDINGS: The highest performing machine and statistical learning methods included both rare variants and those known to be causal of resistance for at least one drug. Both simpler L2 penalized regression and complex machine learning models had high predictive performance. The average AUCs for our highest performing model was 0.979 for first-line drugs and 0.936 for second-line drugs during repeated cross-validation. On an independent validation set, the highest performing model showed average AUCs, sensitivities, and specificities, respectively, of 0.937, 87.9%, and 92.7% for first-line drugs and 0.891, 82.0% and 90.1% for second-line drugs. Our method outperforms existing approaches based on direct association, with increased sum of sensitivity and specificity of 11.7% on first line drugs and 3.2% on second line drugs. Our method has higher predictive performance compared to previously reported machine learning models during cross-validation, with higher AUCs for 8 of 10 drugs. INTERPRETATION: Statistical models, especially those that are trained using both frequent and less frequent variants, significantly improve the accuracy of resistance prediction and hold promise in bringing sequencing technologies closer to the bedside. Elsevier 2019-04-29 /pmc/articles/PMC6557804/ /pubmed/31047860 http://dx.doi.org/10.1016/j.ebiom.2019.04.016 Text en © 2019 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research paper
Chen, Michael L.
Doddi, Akshith
Royer, Jimmy
Freschi, Luca
Schito, Marco
Ezewudo, Matthew
Kohane, Isaac S.
Beam, Andrew
Farhat, Maha
Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction
title Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction
title_full Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction
title_fullStr Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction
title_full_unstemmed Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction
title_short Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction
title_sort beyond multidrug resistance: leveraging rare variants with machine and statistical learning models in mycobacterium tuberculosis resistance prediction
topic Research paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6557804/
https://www.ncbi.nlm.nih.gov/pubmed/31047860
http://dx.doi.org/10.1016/j.ebiom.2019.04.016
work_keys_str_mv AT chenmichaell beyondmultidrugresistanceleveragingrarevariantswithmachineandstatisticallearningmodelsinmycobacteriumtuberculosisresistanceprediction
AT doddiakshith beyondmultidrugresistanceleveragingrarevariantswithmachineandstatisticallearningmodelsinmycobacteriumtuberculosisresistanceprediction
AT royerjimmy beyondmultidrugresistanceleveragingrarevariantswithmachineandstatisticallearningmodelsinmycobacteriumtuberculosisresistanceprediction
AT freschiluca beyondmultidrugresistanceleveragingrarevariantswithmachineandstatisticallearningmodelsinmycobacteriumtuberculosisresistanceprediction
AT schitomarco beyondmultidrugresistanceleveragingrarevariantswithmachineandstatisticallearningmodelsinmycobacteriumtuberculosisresistanceprediction
AT ezewudomatthew beyondmultidrugresistanceleveragingrarevariantswithmachineandstatisticallearningmodelsinmycobacteriumtuberculosisresistanceprediction
AT kohaneisaacs beyondmultidrugresistanceleveragingrarevariantswithmachineandstatisticallearningmodelsinmycobacteriumtuberculosisresistanceprediction
AT beamandrew beyondmultidrugresistanceleveragingrarevariantswithmachineandstatisticallearningmodelsinmycobacteriumtuberculosisresistanceprediction
AT farhatmaha beyondmultidrugresistanceleveragingrarevariantswithmachineandstatisticallearningmodelsinmycobacteriumtuberculosisresistanceprediction