Cargando…

Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records

BACKGROUND: COPD is a highly heterogeneous disease composed of different phenotypes with different aetiological and prognostic profiles and current classification systems do not fully capture this heterogeneity. In this study we sought to discover, describe and validate COPD subtypes using cluster a...

Descripción completa

Detalles Bibliográficos
Autores principales: Pikoula, Maria, Quint, Jennifer Kathleen, Nissen, Francis, Hemingway, Harry, Smeeth, Liam, Denaxas, Spiros
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6472089/
https://www.ncbi.nlm.nih.gov/pubmed/30999919
http://dx.doi.org/10.1186/s12911-019-0805-0
_version_ 1783412175329558528
author Pikoula, Maria
Quint, Jennifer Kathleen
Nissen, Francis
Hemingway, Harry
Smeeth, Liam
Denaxas, Spiros
author_facet Pikoula, Maria
Quint, Jennifer Kathleen
Nissen, Francis
Hemingway, Harry
Smeeth, Liam
Denaxas, Spiros
author_sort Pikoula, Maria
collection PubMed
description BACKGROUND: COPD is a highly heterogeneous disease composed of different phenotypes with different aetiological and prognostic profiles and current classification systems do not fully capture this heterogeneity. In this study we sought to discover, describe and validate COPD subtypes using cluster analysis on data derived from electronic health records. METHODS: We applied two unsupervised learning algorithms (k-means and hierarchical clustering) in 30,961 current and former smokers diagnosed with COPD, using linked national structured electronic health records in England available through the CALIBER resource. We used 15 clinical features, including risk factors and comorbidities and performed dimensionality reduction using multiple correspondence analysis. We compared the association between cluster membership and COPD exacerbations and respiratory and cardiovascular death with 10,736 deaths recorded over 146,466 person-years of follow-up. We also implemented and tested a process to assign unseen patients into clusters using a decision tree classifier. RESULTS: We identified and characterized five COPD patient clusters with distinct patient characteristics with respect to demographics, comorbidities, risk of death and exacerbations. The four subgroups were associated with 1) anxiety/depression; 2) severe airflow obstruction and frailty; 3) cardiovascular disease and diabetes and 4) obesity/atopy. A fifth cluster was associated with low prevalence of most comorbid conditions. CONCLUSIONS: COPD patients can be sub-classified into groups with differing risk factors, comorbidities, and prognosis, based on data included in their primary care records. The identified clusters confirm findings of previous clustering studies and draw attention to anxiety and depression as important drivers of the disease in young, female patients. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0805-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6472089
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64720892019-04-24 Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records Pikoula, Maria Quint, Jennifer Kathleen Nissen, Francis Hemingway, Harry Smeeth, Liam Denaxas, Spiros BMC Med Inform Decis Mak Research Article BACKGROUND: COPD is a highly heterogeneous disease composed of different phenotypes with different aetiological and prognostic profiles and current classification systems do not fully capture this heterogeneity. In this study we sought to discover, describe and validate COPD subtypes using cluster analysis on data derived from electronic health records. METHODS: We applied two unsupervised learning algorithms (k-means and hierarchical clustering) in 30,961 current and former smokers diagnosed with COPD, using linked national structured electronic health records in England available through the CALIBER resource. We used 15 clinical features, including risk factors and comorbidities and performed dimensionality reduction using multiple correspondence analysis. We compared the association between cluster membership and COPD exacerbations and respiratory and cardiovascular death with 10,736 deaths recorded over 146,466 person-years of follow-up. We also implemented and tested a process to assign unseen patients into clusters using a decision tree classifier. RESULTS: We identified and characterized five COPD patient clusters with distinct patient characteristics with respect to demographics, comorbidities, risk of death and exacerbations. The four subgroups were associated with 1) anxiety/depression; 2) severe airflow obstruction and frailty; 3) cardiovascular disease and diabetes and 4) obesity/atopy. A fifth cluster was associated with low prevalence of most comorbid conditions. CONCLUSIONS: COPD patients can be sub-classified into groups with differing risk factors, comorbidities, and prognosis, based on data included in their primary care records. The identified clusters confirm findings of previous clustering studies and draw attention to anxiety and depression as important drivers of the disease in young, female patients. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0805-0) contains supplementary material, which is available to authorized users. BioMed Central 2019-04-18 /pmc/articles/PMC6472089/ /pubmed/30999919 http://dx.doi.org/10.1186/s12911-019-0805-0 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Pikoula, Maria
Quint, Jennifer Kathleen
Nissen, Francis
Hemingway, Harry
Smeeth, Liam
Denaxas, Spiros
Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records
title Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records
title_full Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records
title_fullStr Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records
title_full_unstemmed Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records
title_short Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records
title_sort identifying clinically important copd sub-types using data-driven approaches in primary care population based electronic health records
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6472089/
https://www.ncbi.nlm.nih.gov/pubmed/30999919
http://dx.doi.org/10.1186/s12911-019-0805-0
work_keys_str_mv AT pikoulamaria identifyingclinicallyimportantcopdsubtypesusingdatadrivenapproachesinprimarycarepopulationbasedelectronichealthrecords
AT quintjenniferkathleen identifyingclinicallyimportantcopdsubtypesusingdatadrivenapproachesinprimarycarepopulationbasedelectronichealthrecords
AT nissenfrancis identifyingclinicallyimportantcopdsubtypesusingdatadrivenapproachesinprimarycarepopulationbasedelectronichealthrecords
AT hemingwayharry identifyingclinicallyimportantcopdsubtypesusingdatadrivenapproachesinprimarycarepopulationbasedelectronichealthrecords
AT smeethliam identifyingclinicallyimportantcopdsubtypesusingdatadrivenapproachesinprimarycarepopulationbasedelectronichealthrecords
AT denaxasspiros identifyingclinicallyimportantcopdsubtypesusingdatadrivenapproachesinprimarycarepopulationbasedelectronichealthrecords