Cargando…

Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification

BACKGROUND: Chronic Kidney Disease (CKD) is one of several conditions that affect a growing percentage of the US population; the disease is accompanied by multiple co-morbidities, and is hard to diagnose in-and-of itself. In its advanced forms it carries severe outcomes and can lead to death. It is...

Descripción completa

Detalles Bibliográficos
Autores principales: Bhattacharya, Moumita, Jurkovitz, Claudine, Shatkay, Hagit
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6290512/
https://www.ncbi.nlm.nih.gov/pubmed/30537962
http://dx.doi.org/10.1186/s12911-018-0675-x
_version_ 1783380100966776832
author Bhattacharya, Moumita
Jurkovitz, Claudine
Shatkay, Hagit
author_facet Bhattacharya, Moumita
Jurkovitz, Claudine
Shatkay, Hagit
author_sort Bhattacharya, Moumita
collection PubMed
description BACKGROUND: Chronic Kidney Disease (CKD) is one of several conditions that affect a growing percentage of the US population; the disease is accompanied by multiple co-morbidities, and is hard to diagnose in-and-of itself. In its advanced forms it carries severe outcomes and can lead to death. It is thus important to detect the disease as early as possible, which can help devise effective intervention and treatment plan. Here we investigate ways to utilize information available in electronic health records (EHRs) from regular office visits of more than 13,000 patients, in order to distinguish among several stages of the disease. While clinical data stored in EHRs provide valuable information for risk-stratification, one of the major challenges in using them arises from data imbalance. That is, records associated with a more severe condition are typically under-represented compared to those associated with a milder manifestation of the disease. To address imbalance, we propose and develop a sampling-based ensemble approach, hierarchical meta-classification, aiming to stratify CKD patients into severity stages, using simple quantitative non-text features gathered from standard office visit records. METHODS: The proposed hierarchical meta-classification method frames the multiclass classification task as a hierarchy of two subtasks. The first is binary classification, separating records associated with the majority class from those associated with all minority classes combined, using meta-classification. The second subtask separates the records assigned to the combined minority classes into the individual constituent classes. RESULTS: The proposed method identifies a significant proportion of patients suffering from the more advanced stages of the condition, while also correctly identifying most of the less severe cases, maintaining high sensitivity, specificity and F-measure (≥ 93%). Our results show that the high level of performance attained by our method is preserved even when the size of the training set is significantly reduced, demonstrating the stability and generalizability of our approach. CONCLUSION: We present a new approach to perform classification while addressing data imbalance, which is inherent in the biomedical domain. Our model effectively identifies severity stages of CKD patients, using information readily available in office visit records within the realistic context of high data imbalance.
format Online
Article
Text
id pubmed-6290512
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62905122018-12-17 Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification Bhattacharya, Moumita Jurkovitz, Claudine Shatkay, Hagit BMC Med Inform Decis Mak Research BACKGROUND: Chronic Kidney Disease (CKD) is one of several conditions that affect a growing percentage of the US population; the disease is accompanied by multiple co-morbidities, and is hard to diagnose in-and-of itself. In its advanced forms it carries severe outcomes and can lead to death. It is thus important to detect the disease as early as possible, which can help devise effective intervention and treatment plan. Here we investigate ways to utilize information available in electronic health records (EHRs) from regular office visits of more than 13,000 patients, in order to distinguish among several stages of the disease. While clinical data stored in EHRs provide valuable information for risk-stratification, one of the major challenges in using them arises from data imbalance. That is, records associated with a more severe condition are typically under-represented compared to those associated with a milder manifestation of the disease. To address imbalance, we propose and develop a sampling-based ensemble approach, hierarchical meta-classification, aiming to stratify CKD patients into severity stages, using simple quantitative non-text features gathered from standard office visit records. METHODS: The proposed hierarchical meta-classification method frames the multiclass classification task as a hierarchy of two subtasks. The first is binary classification, separating records associated with the majority class from those associated with all minority classes combined, using meta-classification. The second subtask separates the records assigned to the combined minority classes into the individual constituent classes. RESULTS: The proposed method identifies a significant proportion of patients suffering from the more advanced stages of the condition, while also correctly identifying most of the less severe cases, maintaining high sensitivity, specificity and F-measure (≥ 93%). Our results show that the high level of performance attained by our method is preserved even when the size of the training set is significantly reduced, demonstrating the stability and generalizability of our approach. CONCLUSION: We present a new approach to perform classification while addressing data imbalance, which is inherent in the biomedical domain. Our model effectively identifies severity stages of CKD patients, using information readily available in office visit records within the realistic context of high data imbalance. BioMed Central 2018-12-12 /pmc/articles/PMC6290512/ /pubmed/30537962 http://dx.doi.org/10.1186/s12911-018-0675-x Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Bhattacharya, Moumita
Jurkovitz, Claudine
Shatkay, Hagit
Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification
title Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification
title_full Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification
title_fullStr Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification
title_full_unstemmed Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification
title_short Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification
title_sort chronic kidney disease stratification using office visit records: handling data imbalance via hierarchical meta-classification
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6290512/
https://www.ncbi.nlm.nih.gov/pubmed/30537962
http://dx.doi.org/10.1186/s12911-018-0675-x
work_keys_str_mv AT bhattacharyamoumita chronickidneydiseasestratificationusingofficevisitrecordshandlingdataimbalanceviahierarchicalmetaclassification
AT jurkovitzclaudine chronickidneydiseasestratificationusingofficevisitrecordshandlingdataimbalanceviahierarchicalmetaclassification
AT shatkayhagit chronickidneydiseasestratificationusingofficevisitrecordshandlingdataimbalanceviahierarchicalmetaclassification