Cargando…
Predicting Complete Remission of Acute Myeloid Leukemia: Machine Learning Applied to Gene Expression
Machine learning (ML) is a useful tool for advancing our understanding of the patterns and significance of biomedical data. Given the growing trend on the application of ML techniques in precision medicine, here we present an ML technique which predicts the likelihood of complete remission (CR) in p...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
SAGE Publications
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6423478/ https://www.ncbi.nlm.nih.gov/pubmed/30911218 http://dx.doi.org/10.1177/1176935119835544 |
_version_ | 1783404538144751616 |
---|---|
author | Gal, Ophir Auslander, Noam Fan, Yu Meerzaman, Daoud |
author_facet | Gal, Ophir Auslander, Noam Fan, Yu Meerzaman, Daoud |
author_sort | Gal, Ophir |
collection | PubMed |
description | Machine learning (ML) is a useful tool for advancing our understanding of the patterns and significance of biomedical data. Given the growing trend on the application of ML techniques in precision medicine, here we present an ML technique which predicts the likelihood of complete remission (CR) in patients diagnosed with acute myeloid leukemia (AML). In this study, we explored the question of whether ML algorithms designed to analyze gene-expression patterns obtained through RNA sequencing (RNA-seq) can be used to accurately predict the likelihood of CR in pediatric AML patients who have received induction therapy. We employed tests of statistical significance to determine which genes were differentially expressed in the samples derived from patients who achieved CR after 2 courses of treatment and the samples taken from patients who did not benefit. We tuned classifier hyperparameters to optimize performance and used multiple methods to guide our feature selection as well as our assessment of algorithm performance. To identify the model which performed best within the context of this study, we plotted receiver operating characteristic (ROC) curves. Using the top 75 genes from the k-nearest neighbors algorithm (K-NN) model (K = 27) yielded the best area-under-the-curve (AUC) score that we obtained: 0.84. When we finally tested the previously unseen test data set, the top 50 genes yielded the best AUC = 0.81. Pathway enrichment analysis for these 50 genes showed that the guanosine diphosphate fucose (GDP-fucose) biosynthesis pathway is the most significant with an adjusted P value = .0092, which may suggest the vital role of N-glycosylation in AML. |
format | Online Article Text |
id | pubmed-6423478 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | SAGE Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-64234782019-03-25 Predicting Complete Remission of Acute Myeloid Leukemia: Machine Learning Applied to Gene Expression Gal, Ophir Auslander, Noam Fan, Yu Meerzaman, Daoud Cancer Inform Short Report Machine learning (ML) is a useful tool for advancing our understanding of the patterns and significance of biomedical data. Given the growing trend on the application of ML techniques in precision medicine, here we present an ML technique which predicts the likelihood of complete remission (CR) in patients diagnosed with acute myeloid leukemia (AML). In this study, we explored the question of whether ML algorithms designed to analyze gene-expression patterns obtained through RNA sequencing (RNA-seq) can be used to accurately predict the likelihood of CR in pediatric AML patients who have received induction therapy. We employed tests of statistical significance to determine which genes were differentially expressed in the samples derived from patients who achieved CR after 2 courses of treatment and the samples taken from patients who did not benefit. We tuned classifier hyperparameters to optimize performance and used multiple methods to guide our feature selection as well as our assessment of algorithm performance. To identify the model which performed best within the context of this study, we plotted receiver operating characteristic (ROC) curves. Using the top 75 genes from the k-nearest neighbors algorithm (K-NN) model (K = 27) yielded the best area-under-the-curve (AUC) score that we obtained: 0.84. When we finally tested the previously unseen test data set, the top 50 genes yielded the best AUC = 0.81. Pathway enrichment analysis for these 50 genes showed that the guanosine diphosphate fucose (GDP-fucose) biosynthesis pathway is the most significant with an adjusted P value = .0092, which may suggest the vital role of N-glycosylation in AML. SAGE Publications 2019-03-15 /pmc/articles/PMC6423478/ /pubmed/30911218 http://dx.doi.org/10.1177/1176935119835544 Text en © The Author(s) 2019 http://www.creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). |
spellingShingle | Short Report Gal, Ophir Auslander, Noam Fan, Yu Meerzaman, Daoud Predicting Complete Remission of Acute Myeloid Leukemia: Machine Learning Applied to Gene Expression |
title | Predicting Complete Remission of Acute Myeloid Leukemia: Machine Learning Applied to Gene Expression |
title_full | Predicting Complete Remission of Acute Myeloid Leukemia: Machine Learning Applied to Gene Expression |
title_fullStr | Predicting Complete Remission of Acute Myeloid Leukemia: Machine Learning Applied to Gene Expression |
title_full_unstemmed | Predicting Complete Remission of Acute Myeloid Leukemia: Machine Learning Applied to Gene Expression |
title_short | Predicting Complete Remission of Acute Myeloid Leukemia: Machine Learning Applied to Gene Expression |
title_sort | predicting complete remission of acute myeloid leukemia: machine learning applied to gene expression |
topic | Short Report |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6423478/ https://www.ncbi.nlm.nih.gov/pubmed/30911218 http://dx.doi.org/10.1177/1176935119835544 |
work_keys_str_mv | AT galophir predictingcompleteremissionofacutemyeloidleukemiamachinelearningappliedtogeneexpression AT auslandernoam predictingcompleteremissionofacutemyeloidleukemiamachinelearningappliedtogeneexpression AT fanyu predictingcompleteremissionofacutemyeloidleukemiamachinelearningappliedtogeneexpression AT meerzamandaoud predictingcompleteremissionofacutemyeloidleukemiamachinelearningappliedtogeneexpression |