Cargando…

Deep learning for clustering of multivariate clinical patient trajectories with missing values

BACKGROUND: Precision medicine requires a stratification of patients by disease presentation that is sufficiently informative to allow for selecting treatments on a per-patient basis. For many diseases, such as neurological disorders, this stratification problem translates into a complex problem of...

Descripción completa

Detalles Bibliográficos
Autores principales: de Jong, Johann, Emon, Mohammad Asif, Wu, Ping, Karki, Reagon, Sood, Meemansa, Godard, Patrice, Ahmad, Ashar, Vrooman, Henri, Hofmann-Apitius, Martin, Fröhlich, Holger
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857688/
https://www.ncbi.nlm.nih.gov/pubmed/31730697
http://dx.doi.org/10.1093/gigascience/giz134
_version_ 1783470814092328960
author de Jong, Johann
Emon, Mohammad Asif
Wu, Ping
Karki, Reagon
Sood, Meemansa
Godard, Patrice
Ahmad, Ashar
Vrooman, Henri
Hofmann-Apitius, Martin
Fröhlich, Holger
author_facet de Jong, Johann
Emon, Mohammad Asif
Wu, Ping
Karki, Reagon
Sood, Meemansa
Godard, Patrice
Ahmad, Ashar
Vrooman, Henri
Hofmann-Apitius, Martin
Fröhlich, Holger
author_sort de Jong, Johann
collection PubMed
description BACKGROUND: Precision medicine requires a stratification of patients by disease presentation that is sufficiently informative to allow for selecting treatments on a per-patient basis. For many diseases, such as neurological disorders, this stratification problem translates into a complex problem of clustering multivariate and relatively short time series because (i) these diseases are multifactorial and not well described by single clinical outcome variables and (ii) disease progression needs to be monitored over time. Additionally, clinical data often additionally are hindered by the presence of many missing values, further complicating any clustering attempts. FINDINGS: The problem of clustering multivariate short time series with many missing values is generally not well addressed in the literature. In this work, we propose a deep learning–based method to address this issue, variational deep embedding with recurrence (VaDER). VaDER relies on a Gaussian mixture variational autoencoder framework, which is further extended to (i) model multivariate time series and (ii) directly deal with missing values. We validated VaDER by accurately recovering clusters from simulated and benchmark data with known ground truth clustering, while varying the degree of missingness. We then used VaDER to successfully stratify patients with Alzheimer disease and patients with Parkinson disease into subgroups characterized by clinically divergent disease progression profiles. Additional analyses demonstrated that these clinical differences reflected known underlying aspects of Alzheimer disease and Parkinson disease. CONCLUSIONS: We believe our results show that VaDER can be of great value for future efforts in patient stratification, and multivariate time-series clustering in general.
format Online
Article
Text
id pubmed-6857688
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-68576882019-11-20 Deep learning for clustering of multivariate clinical patient trajectories with missing values de Jong, Johann Emon, Mohammad Asif Wu, Ping Karki, Reagon Sood, Meemansa Godard, Patrice Ahmad, Ashar Vrooman, Henri Hofmann-Apitius, Martin Fröhlich, Holger Gigascience Technical Note BACKGROUND: Precision medicine requires a stratification of patients by disease presentation that is sufficiently informative to allow for selecting treatments on a per-patient basis. For many diseases, such as neurological disorders, this stratification problem translates into a complex problem of clustering multivariate and relatively short time series because (i) these diseases are multifactorial and not well described by single clinical outcome variables and (ii) disease progression needs to be monitored over time. Additionally, clinical data often additionally are hindered by the presence of many missing values, further complicating any clustering attempts. FINDINGS: The problem of clustering multivariate short time series with many missing values is generally not well addressed in the literature. In this work, we propose a deep learning–based method to address this issue, variational deep embedding with recurrence (VaDER). VaDER relies on a Gaussian mixture variational autoencoder framework, which is further extended to (i) model multivariate time series and (ii) directly deal with missing values. We validated VaDER by accurately recovering clusters from simulated and benchmark data with known ground truth clustering, while varying the degree of missingness. We then used VaDER to successfully stratify patients with Alzheimer disease and patients with Parkinson disease into subgroups characterized by clinically divergent disease progression profiles. Additional analyses demonstrated that these clinical differences reflected known underlying aspects of Alzheimer disease and Parkinson disease. CONCLUSIONS: We believe our results show that VaDER can be of great value for future efforts in patient stratification, and multivariate time-series clustering in general. Oxford University Press 2019-11-15 /pmc/articles/PMC6857688/ /pubmed/31730697 http://dx.doi.org/10.1093/gigascience/giz134 Text en © The Author(s) 2019. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
de Jong, Johann
Emon, Mohammad Asif
Wu, Ping
Karki, Reagon
Sood, Meemansa
Godard, Patrice
Ahmad, Ashar
Vrooman, Henri
Hofmann-Apitius, Martin
Fröhlich, Holger
Deep learning for clustering of multivariate clinical patient trajectories with missing values
title Deep learning for clustering of multivariate clinical patient trajectories with missing values
title_full Deep learning for clustering of multivariate clinical patient trajectories with missing values
title_fullStr Deep learning for clustering of multivariate clinical patient trajectories with missing values
title_full_unstemmed Deep learning for clustering of multivariate clinical patient trajectories with missing values
title_short Deep learning for clustering of multivariate clinical patient trajectories with missing values
title_sort deep learning for clustering of multivariate clinical patient trajectories with missing values
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857688/
https://www.ncbi.nlm.nih.gov/pubmed/31730697
http://dx.doi.org/10.1093/gigascience/giz134
work_keys_str_mv AT dejongjohann deeplearningforclusteringofmultivariateclinicalpatienttrajectorieswithmissingvalues
AT emonmohammadasif deeplearningforclusteringofmultivariateclinicalpatienttrajectorieswithmissingvalues
AT wuping deeplearningforclusteringofmultivariateclinicalpatienttrajectorieswithmissingvalues
AT karkireagon deeplearningforclusteringofmultivariateclinicalpatienttrajectorieswithmissingvalues
AT soodmeemansa deeplearningforclusteringofmultivariateclinicalpatienttrajectorieswithmissingvalues
AT godardpatrice deeplearningforclusteringofmultivariateclinicalpatienttrajectorieswithmissingvalues
AT ahmadashar deeplearningforclusteringofmultivariateclinicalpatienttrajectorieswithmissingvalues
AT vroomanhenri deeplearningforclusteringofmultivariateclinicalpatienttrajectorieswithmissingvalues
AT hofmannapitiusmartin deeplearningforclusteringofmultivariateclinicalpatienttrajectorieswithmissingvalues
AT frohlichholger deeplearningforclusteringofmultivariateclinicalpatienttrajectorieswithmissingvalues