Cargando…

Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus

BACKGROUND: Rheumatoid arthritis (RA) and systemic lupus erythematous (SLE) are autoimmune rheumatic diseases that share a complex genetic background and common clinical features. This study’s purpose was to construct machine learning (ML) models for the genomic prediction of RA and SLE. METHODS: A...

Descripción completa

Detalles Bibliográficos
Autores principales: Chung, Chih-Wei, Hsiao, Tzu-Hung, Huang, Chih-Jen, Chen, Yen-Ju, Chen, Hsin-Hua, Lin, Ching-Heng, Chou, Seng-Cho, Chen, Tzer-Shyong, Chung, Yu-Fang, Yang, Hwai-I, Chen, Yi-Ming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8666017/
https://www.ncbi.nlm.nih.gov/pubmed/34895289
http://dx.doi.org/10.1186/s13040-021-00284-5
_version_ 1784614124175491072
author Chung, Chih-Wei
Hsiao, Tzu-Hung
Huang, Chih-Jen
Chen, Yen-Ju
Chen, Hsin-Hua
Lin, Ching-Heng
Chou, Seng-Cho
Chen, Tzer-Shyong
Chung, Yu-Fang
Yang, Hwai-I
Chen, Yi-Ming
author_facet Chung, Chih-Wei
Hsiao, Tzu-Hung
Huang, Chih-Jen
Chen, Yen-Ju
Chen, Hsin-Hua
Lin, Ching-Heng
Chou, Seng-Cho
Chen, Tzer-Shyong
Chung, Yu-Fang
Yang, Hwai-I
Chen, Yi-Ming
author_sort Chung, Chih-Wei
collection PubMed
description BACKGROUND: Rheumatoid arthritis (RA) and systemic lupus erythematous (SLE) are autoimmune rheumatic diseases that share a complex genetic background and common clinical features. This study’s purpose was to construct machine learning (ML) models for the genomic prediction of RA and SLE. METHODS: A total of 2,094 patients with RA and 2,190 patients with SLE were enrolled from the Taichung Veterans General Hospital cohort of the Taiwan Precision Medicine Initiative. Genome-wide single nucleotide polymorphism (SNP) data were obtained using Taiwan Biobank version 2 array. The ML methods used were logistic regression (LR), random forest (RF), support vector machine (SVM), gradient tree boosting (GTB), and extreme gradient boosting (XGB). SHapley Additive exPlanation (SHAP) values were calculated to clarify the contribution of each SNPs. Human leukocyte antigen (HLA) imputation was performed using the HLA Genotype Imputation with Attribute Bagging package. RESULTS: Compared with LR (area under the curve [AUC] = 0.8247), the RF approach (AUC = 0.9844), SVM (AUC = 0.9828), GTB (AUC = 0.9932), and XGB (AUC = 0.9919) exhibited significantly better prediction performance. The top 20 genes by feature importance and SHAP values included HLA class II alleles. We found that imputed HLA-DQA1*05:01, DQB1*0201 and DRB1*0301 were associated with SLE; HLA-DQA1*03:03, DQB1*0401, DRB1*0405 were more frequently observed in patients with RA. CONCLUSIONS: We established ML methods for genomic prediction of RA and SLE. Genetic variations at HLA-DQA1, HLA-DQB1, and HLA-DRB1 were crucial for differentiating RA from SLE. Future studies are required to verify our results and explore their mechanistic explanation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13040-021-00284-5.
format Online
Article
Text
id pubmed-8666017
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-86660172021-12-13 Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus Chung, Chih-Wei Hsiao, Tzu-Hung Huang, Chih-Jen Chen, Yen-Ju Chen, Hsin-Hua Lin, Ching-Heng Chou, Seng-Cho Chen, Tzer-Shyong Chung, Yu-Fang Yang, Hwai-I Chen, Yi-Ming BioData Min Research BACKGROUND: Rheumatoid arthritis (RA) and systemic lupus erythematous (SLE) are autoimmune rheumatic diseases that share a complex genetic background and common clinical features. This study’s purpose was to construct machine learning (ML) models for the genomic prediction of RA and SLE. METHODS: A total of 2,094 patients with RA and 2,190 patients with SLE were enrolled from the Taichung Veterans General Hospital cohort of the Taiwan Precision Medicine Initiative. Genome-wide single nucleotide polymorphism (SNP) data were obtained using Taiwan Biobank version 2 array. The ML methods used were logistic regression (LR), random forest (RF), support vector machine (SVM), gradient tree boosting (GTB), and extreme gradient boosting (XGB). SHapley Additive exPlanation (SHAP) values were calculated to clarify the contribution of each SNPs. Human leukocyte antigen (HLA) imputation was performed using the HLA Genotype Imputation with Attribute Bagging package. RESULTS: Compared with LR (area under the curve [AUC] = 0.8247), the RF approach (AUC = 0.9844), SVM (AUC = 0.9828), GTB (AUC = 0.9932), and XGB (AUC = 0.9919) exhibited significantly better prediction performance. The top 20 genes by feature importance and SHAP values included HLA class II alleles. We found that imputed HLA-DQA1*05:01, DQB1*0201 and DRB1*0301 were associated with SLE; HLA-DQA1*03:03, DQB1*0401, DRB1*0405 were more frequently observed in patients with RA. CONCLUSIONS: We established ML methods for genomic prediction of RA and SLE. Genetic variations at HLA-DQA1, HLA-DQB1, and HLA-DRB1 were crucial for differentiating RA from SLE. Future studies are required to verify our results and explore their mechanistic explanation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13040-021-00284-5. BioMed Central 2021-12-11 /pmc/articles/PMC8666017/ /pubmed/34895289 http://dx.doi.org/10.1186/s13040-021-00284-5 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Chung, Chih-Wei
Hsiao, Tzu-Hung
Huang, Chih-Jen
Chen, Yen-Ju
Chen, Hsin-Hua
Lin, Ching-Heng
Chou, Seng-Cho
Chen, Tzer-Shyong
Chung, Yu-Fang
Yang, Hwai-I
Chen, Yi-Ming
Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus
title Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus
title_full Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus
title_fullStr Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus
title_full_unstemmed Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus
title_short Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus
title_sort machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8666017/
https://www.ncbi.nlm.nih.gov/pubmed/34895289
http://dx.doi.org/10.1186/s13040-021-00284-5
work_keys_str_mv AT chungchihwei machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus
AT hsiaotzuhung machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus
AT huangchihjen machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus
AT chenyenju machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus
AT chenhsinhua machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus
AT linchingheng machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus
AT chousengcho machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus
AT chentzershyong machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus
AT chungyufang machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus
AT yanghwaii machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus
AT chenyiming machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus