Cargando…

Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping

BACKGROUND: In the current era of personalized medicine, there is increasing interest in understanding the heterogeneity in disease populations. Cluster analysis is a method commonly used to identify subtypes in heterogeneous disease populations. The clinical data used in such applications are typic...

Descripción completa

Detalles Bibliográficos
Autores principales: Horne, Elsie, Tibble, Holly, Sheikh, Aziz, Tsanas, Athanasios
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7290450/
https://www.ncbi.nlm.nih.gov/pubmed/32463370
http://dx.doi.org/10.2196/16452
_version_ 1783545678721449984
author Horne, Elsie
Tibble, Holly
Sheikh, Aziz
Tsanas, Athanasios
author_facet Horne, Elsie
Tibble, Holly
Sheikh, Aziz
Tsanas, Athanasios
author_sort Horne, Elsie
collection PubMed
description BACKGROUND: In the current era of personalized medicine, there is increasing interest in understanding the heterogeneity in disease populations. Cluster analysis is a method commonly used to identify subtypes in heterogeneous disease populations. The clinical data used in such applications are typically multimodal, which can make the application of traditional cluster analysis methods challenging. OBJECTIVE: This study aimed to review the research literature on the application of clustering multimodal clinical data to identify asthma subtypes. We assessed common problems and shortcomings in the application of cluster analysis methods in determining asthma subtypes, such that they can be brought to the attention of the research community and avoided in future studies. METHODS: We searched PubMed and Scopus bibliographic databases with terms related to cluster analysis and asthma to identify studies that applied dissimilarity-based cluster analysis methods. We recorded the analytic methods used in each study at each step of the cluster analysis process. RESULTS: Our literature search identified 63 studies that applied cluster analysis to multimodal clinical data to identify asthma subtypes. The features fed into the cluster algorithms were of a mixed type in 47 (75%) studies and continuous in 12 (19%), and the feature type was unclear in the remaining 4 (6%) studies. A total of 23 (37%) studies used hierarchical clustering with Ward linkage, and 22 (35%) studies used k-means clustering. Of these 45 studies, 39 had mixed-type features, but only 5 specified dissimilarity measures that could handle mixed-type features. A further 9 (14%) studies used a preclustering step to create small clusters to feed on a hierarchical method. The original sample sizes in these 9 studies ranged from 84 to 349. The remaining studies used hierarchical clustering with other linkages (n=3), medoid-based methods (n=3), spectral clustering (n=1), and multiple kernel k-means clustering (n=1), and in 1 study, the methods were unclear. Of 63 studies, 54 (86%) explained the methods used to determine the number of clusters, 24 (38%) studies tested the quality of their cluster solution, and 11 (17%) studies tested the stability of their solution. Reporting of the cluster analysis was generally poor in terms of the methods employed and their justification. CONCLUSIONS: This review highlights common issues in the application of cluster analysis to multimodal clinical data to identify asthma subtypes. Some of these issues were related to the multimodal nature of the data, but many were more general issues in the application of cluster analysis. Although cluster analysis may be a useful tool for investigating disease subtypes, we recommend that future studies carefully consider the implications of clustering multimodal data, the cluster analysis process itself, and the reporting of methods to facilitate replication and interpretation of findings.
format Online
Article
Text
id pubmed-7290450
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-72904502020-06-19 Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping Horne, Elsie Tibble, Holly Sheikh, Aziz Tsanas, Athanasios JMIR Med Inform Review BACKGROUND: In the current era of personalized medicine, there is increasing interest in understanding the heterogeneity in disease populations. Cluster analysis is a method commonly used to identify subtypes in heterogeneous disease populations. The clinical data used in such applications are typically multimodal, which can make the application of traditional cluster analysis methods challenging. OBJECTIVE: This study aimed to review the research literature on the application of clustering multimodal clinical data to identify asthma subtypes. We assessed common problems and shortcomings in the application of cluster analysis methods in determining asthma subtypes, such that they can be brought to the attention of the research community and avoided in future studies. METHODS: We searched PubMed and Scopus bibliographic databases with terms related to cluster analysis and asthma to identify studies that applied dissimilarity-based cluster analysis methods. We recorded the analytic methods used in each study at each step of the cluster analysis process. RESULTS: Our literature search identified 63 studies that applied cluster analysis to multimodal clinical data to identify asthma subtypes. The features fed into the cluster algorithms were of a mixed type in 47 (75%) studies and continuous in 12 (19%), and the feature type was unclear in the remaining 4 (6%) studies. A total of 23 (37%) studies used hierarchical clustering with Ward linkage, and 22 (35%) studies used k-means clustering. Of these 45 studies, 39 had mixed-type features, but only 5 specified dissimilarity measures that could handle mixed-type features. A further 9 (14%) studies used a preclustering step to create small clusters to feed on a hierarchical method. The original sample sizes in these 9 studies ranged from 84 to 349. The remaining studies used hierarchical clustering with other linkages (n=3), medoid-based methods (n=3), spectral clustering (n=1), and multiple kernel k-means clustering (n=1), and in 1 study, the methods were unclear. Of 63 studies, 54 (86%) explained the methods used to determine the number of clusters, 24 (38%) studies tested the quality of their cluster solution, and 11 (17%) studies tested the stability of their solution. Reporting of the cluster analysis was generally poor in terms of the methods employed and their justification. CONCLUSIONS: This review highlights common issues in the application of cluster analysis to multimodal clinical data to identify asthma subtypes. Some of these issues were related to the multimodal nature of the data, but many were more general issues in the application of cluster analysis. Although cluster analysis may be a useful tool for investigating disease subtypes, we recommend that future studies carefully consider the implications of clustering multimodal data, the cluster analysis process itself, and the reporting of methods to facilitate replication and interpretation of findings. JMIR Publications 2020-05-28 /pmc/articles/PMC7290450/ /pubmed/32463370 http://dx.doi.org/10.2196/16452 Text en ©Elsie Horne, Holly Tibble, Aziz Sheikh, Athanasios Tsanas. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 28.05.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Review
Horne, Elsie
Tibble, Holly
Sheikh, Aziz
Tsanas, Athanasios
Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping
title Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping
title_full Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping
title_fullStr Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping
title_full_unstemmed Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping
title_short Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping
title_sort challenges of clustering multimodal clinical data: review of applications in asthma subtyping
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7290450/
https://www.ncbi.nlm.nih.gov/pubmed/32463370
http://dx.doi.org/10.2196/16452
work_keys_str_mv AT horneelsie challengesofclusteringmultimodalclinicaldatareviewofapplicationsinasthmasubtyping
AT tibbleholly challengesofclusteringmultimodalclinicaldatareviewofapplicationsinasthmasubtyping
AT sheikhaziz challengesofclusteringmultimodalclinicaldatareviewofapplicationsinasthmasubtyping
AT tsanasathanasios challengesofclusteringmultimodalclinicaldatareviewofapplicationsinasthmasubtyping