Cargando…

Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia

OBJECTIVE: Unsupervised machine learning approaches hold promise for large-scale clinical data. However, the heterogeneity of clinical data raises new methodological challenges in feature selection, choosing a distance metric that captures biological meaning, and visualization. We hypothesized that...

Descripción completa

Detalles Bibliográficos
Autores principales: Coombes, Caitlin E, Abrams, Zachary B, Li, Suli, Abruzzo, Lynne V, Coombes, Kevin R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7647286/
https://www.ncbi.nlm.nih.gov/pubmed/32483590
http://dx.doi.org/10.1093/jamia/ocaa060
_version_ 1783606895935750144
author Coombes, Caitlin E
Abrams, Zachary B
Li, Suli
Abruzzo, Lynne V
Coombes, Kevin R
author_facet Coombes, Caitlin E
Abrams, Zachary B
Li, Suli
Abruzzo, Lynne V
Coombes, Kevin R
author_sort Coombes, Caitlin E
collection PubMed
description OBJECTIVE: Unsupervised machine learning approaches hold promise for large-scale clinical data. However, the heterogeneity of clinical data raises new methodological challenges in feature selection, choosing a distance metric that captures biological meaning, and visualization. We hypothesized that clustering could discover prognostic groups from patients with chronic lymphocytic leukemia, a disease that provides biological validation through well-understood outcomes. METHODS: To address this challenge, we applied k-medoids clustering with 10 distance metrics to 2 experiments (“A” and “B”) with mixed clinical features collapsed to binary vectors and visualized with both multidimensional scaling and t-stochastic neighbor embedding. To assess prognostic utility, we performed survival analysis using a Cox proportional hazard model, log-rank test, and Kaplan-Meier curves. RESULTS: In both experiments, survival analysis revealed a statistically significant association between clusters and survival outcomes (A: overall survival, P = .0164; B: time from diagnosis to treatment, P = .0039). Multidimensional scaling separated clusters along a gradient mirroring the order of overall survival. Longer survival was associated with mutated immunoglobulin heavy-chain variable region gene (IGHV) status, absent Zap 70 expression, female sex, and younger age. CONCLUSIONS: This approach to mixed-type data handling and selection of distance metric captured well-understood, binary, prognostic markers in chronic lymphocytic leukemia (sex, IGHV mutation status, ZAP70 expression status) with high fidelity.
format Online
Article
Text
id pubmed-7647286
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-76472862020-11-30 Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia Coombes, Caitlin E Abrams, Zachary B Li, Suli Abruzzo, Lynne V Coombes, Kevin R J Am Med Inform Assoc Research and Applications OBJECTIVE: Unsupervised machine learning approaches hold promise for large-scale clinical data. However, the heterogeneity of clinical data raises new methodological challenges in feature selection, choosing a distance metric that captures biological meaning, and visualization. We hypothesized that clustering could discover prognostic groups from patients with chronic lymphocytic leukemia, a disease that provides biological validation through well-understood outcomes. METHODS: To address this challenge, we applied k-medoids clustering with 10 distance metrics to 2 experiments (“A” and “B”) with mixed clinical features collapsed to binary vectors and visualized with both multidimensional scaling and t-stochastic neighbor embedding. To assess prognostic utility, we performed survival analysis using a Cox proportional hazard model, log-rank test, and Kaplan-Meier curves. RESULTS: In both experiments, survival analysis revealed a statistically significant association between clusters and survival outcomes (A: overall survival, P = .0164; B: time from diagnosis to treatment, P = .0039). Multidimensional scaling separated clusters along a gradient mirroring the order of overall survival. Longer survival was associated with mutated immunoglobulin heavy-chain variable region gene (IGHV) status, absent Zap 70 expression, female sex, and younger age. CONCLUSIONS: This approach to mixed-type data handling and selection of distance metric captured well-understood, binary, prognostic markers in chronic lymphocytic leukemia (sex, IGHV mutation status, ZAP70 expression status) with high fidelity. Oxford University Press 2020-06-01 /pmc/articles/PMC7647286/ /pubmed/32483590 http://dx.doi.org/10.1093/jamia/ocaa060 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Coombes, Caitlin E
Abrams, Zachary B
Li, Suli
Abruzzo, Lynne V
Coombes, Kevin R
Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia
title Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia
title_full Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia
title_fullStr Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia
title_full_unstemmed Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia
title_short Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia
title_sort unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7647286/
https://www.ncbi.nlm.nih.gov/pubmed/32483590
http://dx.doi.org/10.1093/jamia/ocaa060
work_keys_str_mv AT coombescaitline unsupervisedmachinelearningandprognosticfactorsofsurvivalinchroniclymphocyticleukemia
AT abramszacharyb unsupervisedmachinelearningandprognosticfactorsofsurvivalinchroniclymphocyticleukemia
AT lisuli unsupervisedmachinelearningandprognosticfactorsofsurvivalinchroniclymphocyticleukemia
AT abruzzolynnev unsupervisedmachinelearningandprognosticfactorsofsurvivalinchroniclymphocyticleukemia
AT coombeskevinr unsupervisedmachinelearningandprognosticfactorsofsurvivalinchroniclymphocyticleukemia