Cargando…

Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity

BACKGROUND: Many clinical concepts are standardized under a categorical and hierarchical taxonomy such as ICD-10, ATC, etc. These taxonomic clinical concepts provide insight into semantic meaning and similarity among clinical concepts and have been applied to patient similarity measures. However, th...

Descripción completa

Detalles Bibliográficos
Autores principales: Jia, Zheng, Lu, Xudong, Duan, Huilong, Li, Haomin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6485152/
https://www.ncbi.nlm.nih.gov/pubmed/31023325
http://dx.doi.org/10.1186/s12911-019-0807-y
_version_ 1783414226763644928
author Jia, Zheng
Lu, Xudong
Duan, Huilong
Li, Haomin
author_facet Jia, Zheng
Lu, Xudong
Duan, Huilong
Li, Haomin
author_sort Jia, Zheng
collection PubMed
description BACKGROUND: Many clinical concepts are standardized under a categorical and hierarchical taxonomy such as ICD-10, ATC, etc. These taxonomic clinical concepts provide insight into semantic meaning and similarity among clinical concepts and have been applied to patient similarity measures. However, the effects of diverse set sizes of taxonomic clinical concepts contributing to similarity at the patient level have not been well studied. METHODS: In this paper the most widely used taxonomic clinical concepts system, ICD-10, was studied as a representative taxonomy. The distance between ICD-10-coded diagnosis sets is an integrated estimation of the information content of each concept, the similarity between each pairwise concepts and the similarity between the sets of concepts. We proposed a novel method at the set-level similarity to calculate the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity. A real-world clinical dataset with ICD-10 coded diagnoses and hospital length of stay (HLOS) information was used to evaluate the performance of various algorithms and their combinations in predicting whether a patient need long-term hospitalization or not. Four subpopulation prototypes that were defined based on age and HLOS with different diagnoses set sizes were used as the target for similarity analysis. The F-score was used to evaluate the performance of different algorithms by controlling other factors. We also evaluated the effect of prototype set size on prediction precision. RESULTS: The results identified the strengths and weaknesses of different algorithms to compute information content, code-level similarity and set-level similarity under different contexts, such as set size and concept set background. The minimum weighted bipartite matching approach, which has not been fully recognized previously showed unique advantages in measuring the concepts-based patient similarity. CONCLUSIONS: This study provides a systematic benchmark evaluation of previous algorithms and novel algorithms used in taxonomic concepts-based patient similarity, and it provides the basis for selecting appropriate methods under different clinical scenarios. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0807-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6485152
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64851522019-05-03 Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity Jia, Zheng Lu, Xudong Duan, Huilong Li, Haomin BMC Med Inform Decis Mak Research Article BACKGROUND: Many clinical concepts are standardized under a categorical and hierarchical taxonomy such as ICD-10, ATC, etc. These taxonomic clinical concepts provide insight into semantic meaning and similarity among clinical concepts and have been applied to patient similarity measures. However, the effects of diverse set sizes of taxonomic clinical concepts contributing to similarity at the patient level have not been well studied. METHODS: In this paper the most widely used taxonomic clinical concepts system, ICD-10, was studied as a representative taxonomy. The distance between ICD-10-coded diagnosis sets is an integrated estimation of the information content of each concept, the similarity between each pairwise concepts and the similarity between the sets of concepts. We proposed a novel method at the set-level similarity to calculate the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity. A real-world clinical dataset with ICD-10 coded diagnoses and hospital length of stay (HLOS) information was used to evaluate the performance of various algorithms and their combinations in predicting whether a patient need long-term hospitalization or not. Four subpopulation prototypes that were defined based on age and HLOS with different diagnoses set sizes were used as the target for similarity analysis. The F-score was used to evaluate the performance of different algorithms by controlling other factors. We also evaluated the effect of prototype set size on prediction precision. RESULTS: The results identified the strengths and weaknesses of different algorithms to compute information content, code-level similarity and set-level similarity under different contexts, such as set size and concept set background. The minimum weighted bipartite matching approach, which has not been fully recognized previously showed unique advantages in measuring the concepts-based patient similarity. CONCLUSIONS: This study provides a systematic benchmark evaluation of previous algorithms and novel algorithms used in taxonomic concepts-based patient similarity, and it provides the basis for selecting appropriate methods under different clinical scenarios. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0807-y) contains supplementary material, which is available to authorized users. BioMed Central 2019-04-25 /pmc/articles/PMC6485152/ /pubmed/31023325 http://dx.doi.org/10.1186/s12911-019-0807-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Jia, Zheng
Lu, Xudong
Duan, Huilong
Li, Haomin
Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity
title Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity
title_full Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity
title_fullStr Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity
title_full_unstemmed Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity
title_short Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity
title_sort using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6485152/
https://www.ncbi.nlm.nih.gov/pubmed/31023325
http://dx.doi.org/10.1186/s12911-019-0807-y
work_keys_str_mv AT jiazheng usingthedistancebetweensetsofhierarchicaltaxonomicclinicalconceptstomeasurepatientsimilarity
AT luxudong usingthedistancebetweensetsofhierarchicaltaxonomicclinicalconceptstomeasurepatientsimilarity
AT duanhuilong usingthedistancebetweensetsofhierarchicaltaxonomicclinicalconceptstomeasurepatientsimilarity
AT lihaomin usingthedistancebetweensetsofhierarchicaltaxonomicclinicalconceptstomeasurepatientsimilarity