Cargando…

Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups

OBJECTIVE: Deep significance clustering (DICE) is a self-supervised learning framework. DICE identifies clinically similar and risk-stratified subgroups that neither unsupervised clustering algorithms nor supervised risk prediction algorithms alone are guaranteed to generate. MATERIALS AND METHODS:...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Yufang, Liu, Yifan, Steel, Peter A D, Axsom, Kelly M, Lee, John R, Tummalapalli, Sri Lekha, Wang, Fei, Pathak, Jyotishman, Subramanian, Lakshminarayanan, Zhang, Yiye
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8500061/
https://www.ncbi.nlm.nih.gov/pubmed/34571540
http://dx.doi.org/10.1093/jamia/ocab203
_version_ 1784580384219987968
author Huang, Yufang
Liu, Yifan
Steel, Peter A D
Axsom, Kelly M
Lee, John R
Tummalapalli, Sri Lekha
Wang, Fei
Pathak, Jyotishman
Subramanian, Lakshminarayanan
Zhang, Yiye
author_facet Huang, Yufang
Liu, Yifan
Steel, Peter A D
Axsom, Kelly M
Lee, John R
Tummalapalli, Sri Lekha
Wang, Fei
Pathak, Jyotishman
Subramanian, Lakshminarayanan
Zhang, Yiye
author_sort Huang, Yufang
collection PubMed
description OBJECTIVE: Deep significance clustering (DICE) is a self-supervised learning framework. DICE identifies clinically similar and risk-stratified subgroups that neither unsupervised clustering algorithms nor supervised risk prediction algorithms alone are guaranteed to generate. MATERIALS AND METHODS: Enabled by an optimization process that enforces statistical significance between the outcome and subgroup membership, DICE jointly trains 3 components, representation learning, clustering, and outcome prediction while providing interpretability to the deep representations. DICE also allows unseen patients to be predicted into trained subgroups for population-level risk stratification. We evaluated DICE using electronic health record datasets derived from 2 urban hospitals. Outcomes and patient cohorts used include discharge disposition to home among heart failure (HF) patients and acute kidney injury among COVID-19 (Cov-AKI) patients, respectively. RESULTS: Compared to baseline approaches including principal component analysis, DICE demonstrated superior performance in the cluster purity metrics: Silhouette score (0.48 for HF, 0.51 for Cov-AKI), Calinski-Harabasz index (212 for HF, 254 for Cov-AKI), and Davies-Bouldin index (0.86 for HF, 0.66 for Cov-AKI), and prediction metric: area under the Receiver operating characteristic (ROC) curve (0.83 for HF, 0.78 for Cov-AKI). Clinical evaluation of DICE-generated subgroups revealed more meaningful distributions of member characteristics across subgroups, and higher risk ratios between subgroups. Furthermore, DICE-generated subgroup membership alone was moderately predictive of outcomes. DISCUSSION: DICE addresses a gap in current machine learning approaches where predicted risk may not lead directly to actionable clinical steps. CONCLUSION: DICE demonstrated the potential to apply in heterogeneous populations, where having the same quantitative risk does not equate with having a similar clinical profile.
format Online
Article
Text
id pubmed-8500061
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85000612021-10-08 Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups Huang, Yufang Liu, Yifan Steel, Peter A D Axsom, Kelly M Lee, John R Tummalapalli, Sri Lekha Wang, Fei Pathak, Jyotishman Subramanian, Lakshminarayanan Zhang, Yiye J Am Med Inform Assoc Research and Applications OBJECTIVE: Deep significance clustering (DICE) is a self-supervised learning framework. DICE identifies clinically similar and risk-stratified subgroups that neither unsupervised clustering algorithms nor supervised risk prediction algorithms alone are guaranteed to generate. MATERIALS AND METHODS: Enabled by an optimization process that enforces statistical significance between the outcome and subgroup membership, DICE jointly trains 3 components, representation learning, clustering, and outcome prediction while providing interpretability to the deep representations. DICE also allows unseen patients to be predicted into trained subgroups for population-level risk stratification. We evaluated DICE using electronic health record datasets derived from 2 urban hospitals. Outcomes and patient cohorts used include discharge disposition to home among heart failure (HF) patients and acute kidney injury among COVID-19 (Cov-AKI) patients, respectively. RESULTS: Compared to baseline approaches including principal component analysis, DICE demonstrated superior performance in the cluster purity metrics: Silhouette score (0.48 for HF, 0.51 for Cov-AKI), Calinski-Harabasz index (212 for HF, 254 for Cov-AKI), and Davies-Bouldin index (0.86 for HF, 0.66 for Cov-AKI), and prediction metric: area under the Receiver operating characteristic (ROC) curve (0.83 for HF, 0.78 for Cov-AKI). Clinical evaluation of DICE-generated subgroups revealed more meaningful distributions of member characteristics across subgroups, and higher risk ratios between subgroups. Furthermore, DICE-generated subgroup membership alone was moderately predictive of outcomes. DISCUSSION: DICE addresses a gap in current machine learning approaches where predicted risk may not lead directly to actionable clinical steps. CONCLUSION: DICE demonstrated the potential to apply in heterogeneous populations, where having the same quantitative risk does not equate with having a similar clinical profile. Oxford University Press 2021-09-27 /pmc/articles/PMC8500061/ /pubmed/34571540 http://dx.doi.org/10.1093/jamia/ocab203 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_modelThis article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
spellingShingle Research and Applications
Huang, Yufang
Liu, Yifan
Steel, Peter A D
Axsom, Kelly M
Lee, John R
Tummalapalli, Sri Lekha
Wang, Fei
Pathak, Jyotishman
Subramanian, Lakshminarayanan
Zhang, Yiye
Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups
title Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups
title_full Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups
title_fullStr Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups
title_full_unstemmed Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups
title_short Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups
title_sort deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8500061/
https://www.ncbi.nlm.nih.gov/pubmed/34571540
http://dx.doi.org/10.1093/jamia/ocab203
work_keys_str_mv AT huangyufang deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups
AT liuyifan deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups
AT steelpeterad deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups
AT axsomkellym deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups
AT leejohnr deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups
AT tummalapallisrilekha deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups
AT wangfei deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups
AT pathakjyotishman deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups
AT subramanianlakshminarayanan deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups
AT zhangyiye deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups