Cargando…

Machine Learning Methods for the Diagnosis of Chronic Obstructive Pulmonary Disease in Healthy Subjects: Retrospective Observational Cohort Study

BACKGROUND: Airflow limitation is a critical physiological feature in chronic obstructive pulmonary disease (COPD), for which long-term exposure to noxious substances, including tobacco smoke, is an established risk. However, not all long-term smokers develop COPD, meaning that other risk factors ex...

Descripción completa

Detalles Bibliográficos
Autores principales: Muro, Shigeo, Ishida, Masato, Horie, Yoshiharu, Takeuchi, Wataru, Nakagawa, Shunki, Ban, Hideyuki, Nakagawa, Tohru, Kitamura, Tetsuhisa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8293159/
https://www.ncbi.nlm.nih.gov/pubmed/34255684
http://dx.doi.org/10.2196/24796
_version_ 1783724971941429248
author Muro, Shigeo
Ishida, Masato
Horie, Yoshiharu
Takeuchi, Wataru
Nakagawa, Shunki
Ban, Hideyuki
Nakagawa, Tohru
Kitamura, Tetsuhisa
author_facet Muro, Shigeo
Ishida, Masato
Horie, Yoshiharu
Takeuchi, Wataru
Nakagawa, Shunki
Ban, Hideyuki
Nakagawa, Tohru
Kitamura, Tetsuhisa
author_sort Muro, Shigeo
collection PubMed
description BACKGROUND: Airflow limitation is a critical physiological feature in chronic obstructive pulmonary disease (COPD), for which long-term exposure to noxious substances, including tobacco smoke, is an established risk. However, not all long-term smokers develop COPD, meaning that other risk factors exist. OBJECTIVE: This study aimed to predict the risk factors for COPD diagnosis using machine learning in an annual medical check-up database. METHODS: In this retrospective observational cohort study (ARTDECO [Analysis of Risk Factors to Detect COPD]), annual medical check-up records for all Hitachi Ltd employees in Japan collected from April 1998 to March 2019 were analyzed. Employees who provided informed consent via an opt-out model were screened and those aged 30 to 75 years without a prior diagnosis of COPD/asthma or a history of cancer were included. The database included clinical measurements (eg, pulmonary function tests) and questionnaire responses. To predict the risk factors for COPD diagnosis within a 3-year period, the Gradient Boosting Decision Tree machine learning (XGBoost) method was applied as a primary approach, with logistic regression as a secondary method. A diagnosis of COPD was made when the ratio of the prebronchodilator forced expiratory volume in 1 second (FEV(1)) to prebronchodilator forced vital capacity (FVC) was <0.7 during two consecutive examinations. RESULTS: Of the 26,101 individuals screened, 1213 met the exclusion criteria, and thus, 24,815 individuals were included in the analysis. The top 10 predictors for COPD diagnosis were FEV(1)/FVC, smoking status, allergic symptoms, cough, pack years, hemoglobin A(1c), serum albumin, mean corpuscular volume, percent predicted vital capacity, and percent predicted value of FEV(1). The areas under the receiver operating characteristic curves of the XGBoost model and the logistic regression model were 0.956 and 0.943, respectively. CONCLUSIONS: Using a machine learning model in this longitudinal database, we identified a number of parameters as risk factors other than smoking exposure or lung function to support general practitioners and occupational health physicians to predict the development of COPD. Further research to confirm our results is warranted, as our analysis involved a database used only in Japan.
format Online
Article
Text
id pubmed-8293159
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-82931592021-08-03 Machine Learning Methods for the Diagnosis of Chronic Obstructive Pulmonary Disease in Healthy Subjects: Retrospective Observational Cohort Study Muro, Shigeo Ishida, Masato Horie, Yoshiharu Takeuchi, Wataru Nakagawa, Shunki Ban, Hideyuki Nakagawa, Tohru Kitamura, Tetsuhisa JMIR Med Inform Original Paper BACKGROUND: Airflow limitation is a critical physiological feature in chronic obstructive pulmonary disease (COPD), for which long-term exposure to noxious substances, including tobacco smoke, is an established risk. However, not all long-term smokers develop COPD, meaning that other risk factors exist. OBJECTIVE: This study aimed to predict the risk factors for COPD diagnosis using machine learning in an annual medical check-up database. METHODS: In this retrospective observational cohort study (ARTDECO [Analysis of Risk Factors to Detect COPD]), annual medical check-up records for all Hitachi Ltd employees in Japan collected from April 1998 to March 2019 were analyzed. Employees who provided informed consent via an opt-out model were screened and those aged 30 to 75 years without a prior diagnosis of COPD/asthma or a history of cancer were included. The database included clinical measurements (eg, pulmonary function tests) and questionnaire responses. To predict the risk factors for COPD diagnosis within a 3-year period, the Gradient Boosting Decision Tree machine learning (XGBoost) method was applied as a primary approach, with logistic regression as a secondary method. A diagnosis of COPD was made when the ratio of the prebronchodilator forced expiratory volume in 1 second (FEV(1)) to prebronchodilator forced vital capacity (FVC) was <0.7 during two consecutive examinations. RESULTS: Of the 26,101 individuals screened, 1213 met the exclusion criteria, and thus, 24,815 individuals were included in the analysis. The top 10 predictors for COPD diagnosis were FEV(1)/FVC, smoking status, allergic symptoms, cough, pack years, hemoglobin A(1c), serum albumin, mean corpuscular volume, percent predicted vital capacity, and percent predicted value of FEV(1). The areas under the receiver operating characteristic curves of the XGBoost model and the logistic regression model were 0.956 and 0.943, respectively. CONCLUSIONS: Using a machine learning model in this longitudinal database, we identified a number of parameters as risk factors other than smoking exposure or lung function to support general practitioners and occupational health physicians to predict the development of COPD. Further research to confirm our results is warranted, as our analysis involved a database used only in Japan. JMIR Publications 2021-07-06 /pmc/articles/PMC8293159/ /pubmed/34255684 http://dx.doi.org/10.2196/24796 Text en ©Shigeo Muro, Masato Ishida, Yoshiharu Horie, Wataru Takeuchi, Shunki Nakagawa, Hideyuki Ban, Tohru Nakagawa, Tetsuhisa Kitamura. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 06.07.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Muro, Shigeo
Ishida, Masato
Horie, Yoshiharu
Takeuchi, Wataru
Nakagawa, Shunki
Ban, Hideyuki
Nakagawa, Tohru
Kitamura, Tetsuhisa
Machine Learning Methods for the Diagnosis of Chronic Obstructive Pulmonary Disease in Healthy Subjects: Retrospective Observational Cohort Study
title Machine Learning Methods for the Diagnosis of Chronic Obstructive Pulmonary Disease in Healthy Subjects: Retrospective Observational Cohort Study
title_full Machine Learning Methods for the Diagnosis of Chronic Obstructive Pulmonary Disease in Healthy Subjects: Retrospective Observational Cohort Study
title_fullStr Machine Learning Methods for the Diagnosis of Chronic Obstructive Pulmonary Disease in Healthy Subjects: Retrospective Observational Cohort Study
title_full_unstemmed Machine Learning Methods for the Diagnosis of Chronic Obstructive Pulmonary Disease in Healthy Subjects: Retrospective Observational Cohort Study
title_short Machine Learning Methods for the Diagnosis of Chronic Obstructive Pulmonary Disease in Healthy Subjects: Retrospective Observational Cohort Study
title_sort machine learning methods for the diagnosis of chronic obstructive pulmonary disease in healthy subjects: retrospective observational cohort study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8293159/
https://www.ncbi.nlm.nih.gov/pubmed/34255684
http://dx.doi.org/10.2196/24796
work_keys_str_mv AT muroshigeo machinelearningmethodsforthediagnosisofchronicobstructivepulmonarydiseaseinhealthysubjectsretrospectiveobservationalcohortstudy
AT ishidamasato machinelearningmethodsforthediagnosisofchronicobstructivepulmonarydiseaseinhealthysubjectsretrospectiveobservationalcohortstudy
AT horieyoshiharu machinelearningmethodsforthediagnosisofchronicobstructivepulmonarydiseaseinhealthysubjectsretrospectiveobservationalcohortstudy
AT takeuchiwataru machinelearningmethodsforthediagnosisofchronicobstructivepulmonarydiseaseinhealthysubjectsretrospectiveobservationalcohortstudy
AT nakagawashunki machinelearningmethodsforthediagnosisofchronicobstructivepulmonarydiseaseinhealthysubjectsretrospectiveobservationalcohortstudy
AT banhideyuki machinelearningmethodsforthediagnosisofchronicobstructivepulmonarydiseaseinhealthysubjectsretrospectiveobservationalcohortstudy
AT nakagawatohru machinelearningmethodsforthediagnosisofchronicobstructivepulmonarydiseaseinhealthysubjectsretrospectiveobservationalcohortstudy
AT kitamuratetsuhisa machinelearningmethodsforthediagnosisofchronicobstructivepulmonarydiseaseinhealthysubjectsretrospectiveobservationalcohortstudy