Cargando…
Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups
OBJECTIVE: Deep significance clustering (DICE) is a self-supervised learning framework. DICE identifies clinically similar and risk-stratified subgroups that neither unsupervised clustering algorithms nor supervised risk prediction algorithms alone are guaranteed to generate. MATERIALS AND METHODS:...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8500061/ https://www.ncbi.nlm.nih.gov/pubmed/34571540 http://dx.doi.org/10.1093/jamia/ocab203 |
_version_ | 1784580384219987968 |
---|---|
author | Huang, Yufang Liu, Yifan Steel, Peter A D Axsom, Kelly M Lee, John R Tummalapalli, Sri Lekha Wang, Fei Pathak, Jyotishman Subramanian, Lakshminarayanan Zhang, Yiye |
author_facet | Huang, Yufang Liu, Yifan Steel, Peter A D Axsom, Kelly M Lee, John R Tummalapalli, Sri Lekha Wang, Fei Pathak, Jyotishman Subramanian, Lakshminarayanan Zhang, Yiye |
author_sort | Huang, Yufang |
collection | PubMed |
description | OBJECTIVE: Deep significance clustering (DICE) is a self-supervised learning framework. DICE identifies clinically similar and risk-stratified subgroups that neither unsupervised clustering algorithms nor supervised risk prediction algorithms alone are guaranteed to generate. MATERIALS AND METHODS: Enabled by an optimization process that enforces statistical significance between the outcome and subgroup membership, DICE jointly trains 3 components, representation learning, clustering, and outcome prediction while providing interpretability to the deep representations. DICE also allows unseen patients to be predicted into trained subgroups for population-level risk stratification. We evaluated DICE using electronic health record datasets derived from 2 urban hospitals. Outcomes and patient cohorts used include discharge disposition to home among heart failure (HF) patients and acute kidney injury among COVID-19 (Cov-AKI) patients, respectively. RESULTS: Compared to baseline approaches including principal component analysis, DICE demonstrated superior performance in the cluster purity metrics: Silhouette score (0.48 for HF, 0.51 for Cov-AKI), Calinski-Harabasz index (212 for HF, 254 for Cov-AKI), and Davies-Bouldin index (0.86 for HF, 0.66 for Cov-AKI), and prediction metric: area under the Receiver operating characteristic (ROC) curve (0.83 for HF, 0.78 for Cov-AKI). Clinical evaluation of DICE-generated subgroups revealed more meaningful distributions of member characteristics across subgroups, and higher risk ratios between subgroups. Furthermore, DICE-generated subgroup membership alone was moderately predictive of outcomes. DISCUSSION: DICE addresses a gap in current machine learning approaches where predicted risk may not lead directly to actionable clinical steps. CONCLUSION: DICE demonstrated the potential to apply in heterogeneous populations, where having the same quantitative risk does not equate with having a similar clinical profile. |
format | Online Article Text |
id | pubmed-8500061 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-85000612021-10-08 Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups Huang, Yufang Liu, Yifan Steel, Peter A D Axsom, Kelly M Lee, John R Tummalapalli, Sri Lekha Wang, Fei Pathak, Jyotishman Subramanian, Lakshminarayanan Zhang, Yiye J Am Med Inform Assoc Research and Applications OBJECTIVE: Deep significance clustering (DICE) is a self-supervised learning framework. DICE identifies clinically similar and risk-stratified subgroups that neither unsupervised clustering algorithms nor supervised risk prediction algorithms alone are guaranteed to generate. MATERIALS AND METHODS: Enabled by an optimization process that enforces statistical significance between the outcome and subgroup membership, DICE jointly trains 3 components, representation learning, clustering, and outcome prediction while providing interpretability to the deep representations. DICE also allows unseen patients to be predicted into trained subgroups for population-level risk stratification. We evaluated DICE using electronic health record datasets derived from 2 urban hospitals. Outcomes and patient cohorts used include discharge disposition to home among heart failure (HF) patients and acute kidney injury among COVID-19 (Cov-AKI) patients, respectively. RESULTS: Compared to baseline approaches including principal component analysis, DICE demonstrated superior performance in the cluster purity metrics: Silhouette score (0.48 for HF, 0.51 for Cov-AKI), Calinski-Harabasz index (212 for HF, 254 for Cov-AKI), and Davies-Bouldin index (0.86 for HF, 0.66 for Cov-AKI), and prediction metric: area under the Receiver operating characteristic (ROC) curve (0.83 for HF, 0.78 for Cov-AKI). Clinical evaluation of DICE-generated subgroups revealed more meaningful distributions of member characteristics across subgroups, and higher risk ratios between subgroups. Furthermore, DICE-generated subgroup membership alone was moderately predictive of outcomes. DISCUSSION: DICE addresses a gap in current machine learning approaches where predicted risk may not lead directly to actionable clinical steps. CONCLUSION: DICE demonstrated the potential to apply in heterogeneous populations, where having the same quantitative risk does not equate with having a similar clinical profile. Oxford University Press 2021-09-27 /pmc/articles/PMC8500061/ /pubmed/34571540 http://dx.doi.org/10.1093/jamia/ocab203 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_modelThis article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) |
spellingShingle | Research and Applications Huang, Yufang Liu, Yifan Steel, Peter A D Axsom, Kelly M Lee, John R Tummalapalli, Sri Lekha Wang, Fei Pathak, Jyotishman Subramanian, Lakshminarayanan Zhang, Yiye Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups |
title | Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups |
title_full | Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups |
title_fullStr | Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups |
title_full_unstemmed | Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups |
title_short | Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups |
title_sort | deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8500061/ https://www.ncbi.nlm.nih.gov/pubmed/34571540 http://dx.doi.org/10.1093/jamia/ocab203 |
work_keys_str_mv | AT huangyufang deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups AT liuyifan deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups AT steelpeterad deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups AT axsomkellym deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups AT leejohnr deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups AT tummalapallisrilekha deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups AT wangfei deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups AT pathakjyotishman deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups AT subramanianlakshminarayanan deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups AT zhangyiye deepsignificanceclusteringanovelapproachforidentifyingriskstratifiedandpredictivepatientsubgroups |