Cargando…

Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population

BACKGROUND: Chronic obstructive pulmonary disease (COPD) is a major public health problem and cause of mortality worldwide. However, COPD in the early stage is usually not recognized and diagnosed. It is necessary to establish a risk model to predict COPD development. METHODS: A total of 441 COPD pa...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Xia, Wu, Yanping, Zhang, Ling, Yuan, Weilan, Yan, Li, Fan, Sha, Lian, Yunzhi, Zhu, Xia, Gao, Junhui, Zhao, Jiangman, Zhang, Ping, Tang, Hui, Jia, Weihua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7110698/
https://www.ncbi.nlm.nih.gov/pubmed/32234053
http://dx.doi.org/10.1186/s12967-020-02312-0
_version_ 1783513110165848064
author Ma, Xia
Wu, Yanping
Zhang, Ling
Yuan, Weilan
Yan, Li
Fan, Sha
Lian, Yunzhi
Zhu, Xia
Gao, Junhui
Zhao, Jiangman
Zhang, Ping
Tang, Hui
Jia, Weihua
author_facet Ma, Xia
Wu, Yanping
Zhang, Ling
Yuan, Weilan
Yan, Li
Fan, Sha
Lian, Yunzhi
Zhu, Xia
Gao, Junhui
Zhao, Jiangman
Zhang, Ping
Tang, Hui
Jia, Weihua
author_sort Ma, Xia
collection PubMed
description BACKGROUND: Chronic obstructive pulmonary disease (COPD) is a major public health problem and cause of mortality worldwide. However, COPD in the early stage is usually not recognized and diagnosed. It is necessary to establish a risk model to predict COPD development. METHODS: A total of 441 COPD patients and 192 control subjects were recruited, and 101 single-nucleotide polymorphisms (SNPs) were determined using the MassArray assay. With 5 clinical features as well as SNPs, 6 predictive models were established and evaluated in the training set and test set by the confusion matrix AU-ROC, AU-PRC, sensitivity (recall), specificity, accuracy, F1 score, MCC, PPV (precision) and NPV. The selected features were ranked. RESULTS: Nine SNPs were significantly associated with COPD. Among them, 6 SNPs (rs1007052, OR = 1.671, P = 0.010; rs2910164, OR = 1.416, P < 0.037; rs473892, OR = 1.473, P < 0.044; rs161976, OR = 1.594, P < 0.044; rs159497, OR = 1.445, P < 0.045; and rs9296092, OR = 1.832, P < 0.045) were risk factors for COPD, while 3 SNPs (rs8192288, OR = 0.593, P < 0.015; rs20541, OR = 0.669, P < 0.018; and rs12922394, OR = 0.651, P < 0.022) were protective factors for COPD development. In the training set, KNN, LR, SVM, DT and XGboost obtained AU-ROC values above 0.82 and AU-PRC values above 0.92. Among these models, XGboost obtained the highest AU-ROC (0.94), AU-PRC (0.97), accuracy (0.91), precision (0.95), F1 score (0.94), MCC (0.77) and specificity (0.85), while MLP obtained the highest sensitivity (recall) (0.99) and NPV (0.87). In the validation set, KNN, LR and XGboost obtained AU-ROC and AU-PRC values above 0.80 and 0.85, respectively. KNN had the highest precision (0.82), both KNN and LR obtained the same highest accuracy (0.81), and KNN and LR had the same highest F1 score (0.86). Both DT and MLP obtained sensitivity (recall) and NPV values above 0.94 and 0.84, respectively. In the feature importance analyses, we identified that AQCI, age, and BMI had the greatest impact on the predictive abilities of the models, while SNPs, sex and smoking were less important. CONCLUSIONS: The KNN, LR and XGboost models showed excellent overall predictive power, and the use of machine learning tools combining both clinical and SNP features was suitable for predicting the risk of COPD development.
format Online
Article
Text
id pubmed-7110698
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-71106982020-04-07 Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population Ma, Xia Wu, Yanping Zhang, Ling Yuan, Weilan Yan, Li Fan, Sha Lian, Yunzhi Zhu, Xia Gao, Junhui Zhao, Jiangman Zhang, Ping Tang, Hui Jia, Weihua J Transl Med Research BACKGROUND: Chronic obstructive pulmonary disease (COPD) is a major public health problem and cause of mortality worldwide. However, COPD in the early stage is usually not recognized and diagnosed. It is necessary to establish a risk model to predict COPD development. METHODS: A total of 441 COPD patients and 192 control subjects were recruited, and 101 single-nucleotide polymorphisms (SNPs) were determined using the MassArray assay. With 5 clinical features as well as SNPs, 6 predictive models were established and evaluated in the training set and test set by the confusion matrix AU-ROC, AU-PRC, sensitivity (recall), specificity, accuracy, F1 score, MCC, PPV (precision) and NPV. The selected features were ranked. RESULTS: Nine SNPs were significantly associated with COPD. Among them, 6 SNPs (rs1007052, OR = 1.671, P = 0.010; rs2910164, OR = 1.416, P < 0.037; rs473892, OR = 1.473, P < 0.044; rs161976, OR = 1.594, P < 0.044; rs159497, OR = 1.445, P < 0.045; and rs9296092, OR = 1.832, P < 0.045) were risk factors for COPD, while 3 SNPs (rs8192288, OR = 0.593, P < 0.015; rs20541, OR = 0.669, P < 0.018; and rs12922394, OR = 0.651, P < 0.022) were protective factors for COPD development. In the training set, KNN, LR, SVM, DT and XGboost obtained AU-ROC values above 0.82 and AU-PRC values above 0.92. Among these models, XGboost obtained the highest AU-ROC (0.94), AU-PRC (0.97), accuracy (0.91), precision (0.95), F1 score (0.94), MCC (0.77) and specificity (0.85), while MLP obtained the highest sensitivity (recall) (0.99) and NPV (0.87). In the validation set, KNN, LR and XGboost obtained AU-ROC and AU-PRC values above 0.80 and 0.85, respectively. KNN had the highest precision (0.82), both KNN and LR obtained the same highest accuracy (0.81), and KNN and LR had the same highest F1 score (0.86). Both DT and MLP obtained sensitivity (recall) and NPV values above 0.94 and 0.84, respectively. In the feature importance analyses, we identified that AQCI, age, and BMI had the greatest impact on the predictive abilities of the models, while SNPs, sex and smoking were less important. CONCLUSIONS: The KNN, LR and XGboost models showed excellent overall predictive power, and the use of machine learning tools combining both clinical and SNP features was suitable for predicting the risk of COPD development. BioMed Central 2020-03-31 /pmc/articles/PMC7110698/ /pubmed/32234053 http://dx.doi.org/10.1186/s12967-020-02312-0 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Ma, Xia
Wu, Yanping
Zhang, Ling
Yuan, Weilan
Yan, Li
Fan, Sha
Lian, Yunzhi
Zhu, Xia
Gao, Junhui
Zhao, Jiangman
Zhang, Ping
Tang, Hui
Jia, Weihua
Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population
title Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population
title_full Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population
title_fullStr Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population
title_full_unstemmed Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population
title_short Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population
title_sort comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the chinese population
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7110698/
https://www.ncbi.nlm.nih.gov/pubmed/32234053
http://dx.doi.org/10.1186/s12967-020-02312-0
work_keys_str_mv AT maxia comparisonanddevelopmentofmachinelearningtoolsforthepredictionofchronicobstructivepulmonarydiseaseinthechinesepopulation
AT wuyanping comparisonanddevelopmentofmachinelearningtoolsforthepredictionofchronicobstructivepulmonarydiseaseinthechinesepopulation
AT zhangling comparisonanddevelopmentofmachinelearningtoolsforthepredictionofchronicobstructivepulmonarydiseaseinthechinesepopulation
AT yuanweilan comparisonanddevelopmentofmachinelearningtoolsforthepredictionofchronicobstructivepulmonarydiseaseinthechinesepopulation
AT yanli comparisonanddevelopmentofmachinelearningtoolsforthepredictionofchronicobstructivepulmonarydiseaseinthechinesepopulation
AT fansha comparisonanddevelopmentofmachinelearningtoolsforthepredictionofchronicobstructivepulmonarydiseaseinthechinesepopulation
AT lianyunzhi comparisonanddevelopmentofmachinelearningtoolsforthepredictionofchronicobstructivepulmonarydiseaseinthechinesepopulation
AT zhuxia comparisonanddevelopmentofmachinelearningtoolsforthepredictionofchronicobstructivepulmonarydiseaseinthechinesepopulation
AT gaojunhui comparisonanddevelopmentofmachinelearningtoolsforthepredictionofchronicobstructivepulmonarydiseaseinthechinesepopulation
AT zhaojiangman comparisonanddevelopmentofmachinelearningtoolsforthepredictionofchronicobstructivepulmonarydiseaseinthechinesepopulation
AT zhangping comparisonanddevelopmentofmachinelearningtoolsforthepredictionofchronicobstructivepulmonarydiseaseinthechinesepopulation
AT tanghui comparisonanddevelopmentofmachinelearningtoolsforthepredictionofchronicobstructivepulmonarydiseaseinthechinesepopulation
AT jiaweihua comparisonanddevelopmentofmachinelearningtoolsforthepredictionofchronicobstructivepulmonarydiseaseinthechinesepopulation