Cargando…

Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking

Resistance prediction and mutation ranking are important tasks in the analysis of Tuberculosis sequence data. Due to standard regimens for the use of first-line antibiotics, resistance co-occurrence, in which samples are resistant to multiple drugs, is common. Analysing all drugs simultaneously shou...

Descripción completa

Detalles Bibliográficos
Autores principales: Kouchaki, Samaneh, Yang, Yang, Lachapelle, Alexander, Walker, Timothy M., Walker, A. Sarah, Peto, Timothy E. A., Crook, Derrick W., Clifton, David A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7188832/
https://www.ncbi.nlm.nih.gov/pubmed/32390972
http://dx.doi.org/10.3389/fmicb.2020.00667
_version_ 1783527377620434944
author Kouchaki, Samaneh
Yang, Yang
Lachapelle, Alexander
Walker, Timothy M.
Walker, A. Sarah
Peto, Timothy E. A.
Crook, Derrick W.
Clifton, David A.
author_facet Kouchaki, Samaneh
Yang, Yang
Lachapelle, Alexander
Walker, Timothy M.
Walker, A. Sarah
Peto, Timothy E. A.
Crook, Derrick W.
Clifton, David A.
author_sort Kouchaki, Samaneh
collection PubMed
description Resistance prediction and mutation ranking are important tasks in the analysis of Tuberculosis sequence data. Due to standard regimens for the use of first-line antibiotics, resistance co-occurrence, in which samples are resistant to multiple drugs, is common. Analysing all drugs simultaneously should therefore enable patterns reflecting resistance co-occurrence to be exploited for resistance prediction. Here, multi-label random forest (MLRF) models are compared with single-label random forest (SLRF) for both predicting phenotypic resistance from whole genome sequences and identifying important mutations for better prediction of four first-line drugs in a dataset of 13402 Mycobacterium tuberculosis isolates. Results confirmed that MLRFs can improve performance compared to conventional clinical methods (by 18.10%) and SLRFs (by 0.91%). In addition, we identified a list of candidate mutations that are important for resistance prediction or that are related to resistance co-occurrence. Moreover, we found that retraining our analysis to a subset of top-ranked mutations was sufficient to achieve satisfactory performance. The source code can be found at http://www.robots.ox.ac.uk/~davidc/code.php.
format Online
Article
Text
id pubmed-7188832
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-71888322020-05-08 Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking Kouchaki, Samaneh Yang, Yang Lachapelle, Alexander Walker, Timothy M. Walker, A. Sarah Peto, Timothy E. A. Crook, Derrick W. Clifton, David A. Front Microbiol Microbiology Resistance prediction and mutation ranking are important tasks in the analysis of Tuberculosis sequence data. Due to standard regimens for the use of first-line antibiotics, resistance co-occurrence, in which samples are resistant to multiple drugs, is common. Analysing all drugs simultaneously should therefore enable patterns reflecting resistance co-occurrence to be exploited for resistance prediction. Here, multi-label random forest (MLRF) models are compared with single-label random forest (SLRF) for both predicting phenotypic resistance from whole genome sequences and identifying important mutations for better prediction of four first-line drugs in a dataset of 13402 Mycobacterium tuberculosis isolates. Results confirmed that MLRFs can improve performance compared to conventional clinical methods (by 18.10%) and SLRFs (by 0.91%). In addition, we identified a list of candidate mutations that are important for resistance prediction or that are related to resistance co-occurrence. Moreover, we found that retraining our analysis to a subset of top-ranked mutations was sufficient to achieve satisfactory performance. The source code can be found at http://www.robots.ox.ac.uk/~davidc/code.php. Frontiers Media S.A. 2020-04-22 /pmc/articles/PMC7188832/ /pubmed/32390972 http://dx.doi.org/10.3389/fmicb.2020.00667 Text en Copyright © 2020 Kouchaki, Yang, Lachapelle, Walker, Walker, CRyPTIC Consortium, Peto, Crook and Clifton. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Kouchaki, Samaneh
Yang, Yang
Lachapelle, Alexander
Walker, Timothy M.
Walker, A. Sarah
Peto, Timothy E. A.
Crook, Derrick W.
Clifton, David A.
Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking
title Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking
title_full Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking
title_fullStr Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking
title_full_unstemmed Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking
title_short Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking
title_sort multi-label random forest model for tuberculosis drug resistance classification and mutation ranking
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7188832/
https://www.ncbi.nlm.nih.gov/pubmed/32390972
http://dx.doi.org/10.3389/fmicb.2020.00667
work_keys_str_mv AT kouchakisamaneh multilabelrandomforestmodelfortuberculosisdrugresistanceclassificationandmutationranking
AT yangyang multilabelrandomforestmodelfortuberculosisdrugresistanceclassificationandmutationranking
AT lachapellealexander multilabelrandomforestmodelfortuberculosisdrugresistanceclassificationandmutationranking
AT walkertimothym multilabelrandomforestmodelfortuberculosisdrugresistanceclassificationandmutationranking
AT walkerasarah multilabelrandomforestmodelfortuberculosisdrugresistanceclassificationandmutationranking
AT multilabelrandomforestmodelfortuberculosisdrugresistanceclassificationandmutationranking
AT petotimothyea multilabelrandomforestmodelfortuberculosisdrugresistanceclassificationandmutationranking
AT crookderrickw multilabelrandomforestmodelfortuberculosisdrugresistanceclassificationandmutationranking
AT cliftondavida multilabelrandomforestmodelfortuberculosisdrugresistanceclassificationandmutationranking