Cargando…
Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus
BACKGROUND: Rheumatoid arthritis (RA) and systemic lupus erythematous (SLE) are autoimmune rheumatic diseases that share a complex genetic background and common clinical features. This study’s purpose was to construct machine learning (ML) models for the genomic prediction of RA and SLE. METHODS: A...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8666017/ https://www.ncbi.nlm.nih.gov/pubmed/34895289 http://dx.doi.org/10.1186/s13040-021-00284-5 |
_version_ | 1784614124175491072 |
---|---|
author | Chung, Chih-Wei Hsiao, Tzu-Hung Huang, Chih-Jen Chen, Yen-Ju Chen, Hsin-Hua Lin, Ching-Heng Chou, Seng-Cho Chen, Tzer-Shyong Chung, Yu-Fang Yang, Hwai-I Chen, Yi-Ming |
author_facet | Chung, Chih-Wei Hsiao, Tzu-Hung Huang, Chih-Jen Chen, Yen-Ju Chen, Hsin-Hua Lin, Ching-Heng Chou, Seng-Cho Chen, Tzer-Shyong Chung, Yu-Fang Yang, Hwai-I Chen, Yi-Ming |
author_sort | Chung, Chih-Wei |
collection | PubMed |
description | BACKGROUND: Rheumatoid arthritis (RA) and systemic lupus erythematous (SLE) are autoimmune rheumatic diseases that share a complex genetic background and common clinical features. This study’s purpose was to construct machine learning (ML) models for the genomic prediction of RA and SLE. METHODS: A total of 2,094 patients with RA and 2,190 patients with SLE were enrolled from the Taichung Veterans General Hospital cohort of the Taiwan Precision Medicine Initiative. Genome-wide single nucleotide polymorphism (SNP) data were obtained using Taiwan Biobank version 2 array. The ML methods used were logistic regression (LR), random forest (RF), support vector machine (SVM), gradient tree boosting (GTB), and extreme gradient boosting (XGB). SHapley Additive exPlanation (SHAP) values were calculated to clarify the contribution of each SNPs. Human leukocyte antigen (HLA) imputation was performed using the HLA Genotype Imputation with Attribute Bagging package. RESULTS: Compared with LR (area under the curve [AUC] = 0.8247), the RF approach (AUC = 0.9844), SVM (AUC = 0.9828), GTB (AUC = 0.9932), and XGB (AUC = 0.9919) exhibited significantly better prediction performance. The top 20 genes by feature importance and SHAP values included HLA class II alleles. We found that imputed HLA-DQA1*05:01, DQB1*0201 and DRB1*0301 were associated with SLE; HLA-DQA1*03:03, DQB1*0401, DRB1*0405 were more frequently observed in patients with RA. CONCLUSIONS: We established ML methods for genomic prediction of RA and SLE. Genetic variations at HLA-DQA1, HLA-DQB1, and HLA-DRB1 were crucial for differentiating RA from SLE. Future studies are required to verify our results and explore their mechanistic explanation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13040-021-00284-5. |
format | Online Article Text |
id | pubmed-8666017 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-86660172021-12-13 Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus Chung, Chih-Wei Hsiao, Tzu-Hung Huang, Chih-Jen Chen, Yen-Ju Chen, Hsin-Hua Lin, Ching-Heng Chou, Seng-Cho Chen, Tzer-Shyong Chung, Yu-Fang Yang, Hwai-I Chen, Yi-Ming BioData Min Research BACKGROUND: Rheumatoid arthritis (RA) and systemic lupus erythematous (SLE) are autoimmune rheumatic diseases that share a complex genetic background and common clinical features. This study’s purpose was to construct machine learning (ML) models for the genomic prediction of RA and SLE. METHODS: A total of 2,094 patients with RA and 2,190 patients with SLE were enrolled from the Taichung Veterans General Hospital cohort of the Taiwan Precision Medicine Initiative. Genome-wide single nucleotide polymorphism (SNP) data were obtained using Taiwan Biobank version 2 array. The ML methods used were logistic regression (LR), random forest (RF), support vector machine (SVM), gradient tree boosting (GTB), and extreme gradient boosting (XGB). SHapley Additive exPlanation (SHAP) values were calculated to clarify the contribution of each SNPs. Human leukocyte antigen (HLA) imputation was performed using the HLA Genotype Imputation with Attribute Bagging package. RESULTS: Compared with LR (area under the curve [AUC] = 0.8247), the RF approach (AUC = 0.9844), SVM (AUC = 0.9828), GTB (AUC = 0.9932), and XGB (AUC = 0.9919) exhibited significantly better prediction performance. The top 20 genes by feature importance and SHAP values included HLA class II alleles. We found that imputed HLA-DQA1*05:01, DQB1*0201 and DRB1*0301 were associated with SLE; HLA-DQA1*03:03, DQB1*0401, DRB1*0405 were more frequently observed in patients with RA. CONCLUSIONS: We established ML methods for genomic prediction of RA and SLE. Genetic variations at HLA-DQA1, HLA-DQB1, and HLA-DRB1 were crucial for differentiating RA from SLE. Future studies are required to verify our results and explore their mechanistic explanation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13040-021-00284-5. BioMed Central 2021-12-11 /pmc/articles/PMC8666017/ /pubmed/34895289 http://dx.doi.org/10.1186/s13040-021-00284-5 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Chung, Chih-Wei Hsiao, Tzu-Hung Huang, Chih-Jen Chen, Yen-Ju Chen, Hsin-Hua Lin, Ching-Heng Chou, Seng-Cho Chen, Tzer-Shyong Chung, Yu-Fang Yang, Hwai-I Chen, Yi-Ming Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus |
title | Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus |
title_full | Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus |
title_fullStr | Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus |
title_full_unstemmed | Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus |
title_short | Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus |
title_sort | machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8666017/ https://www.ncbi.nlm.nih.gov/pubmed/34895289 http://dx.doi.org/10.1186/s13040-021-00284-5 |
work_keys_str_mv | AT chungchihwei machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus AT hsiaotzuhung machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus AT huangchihjen machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus AT chenyenju machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus AT chenhsinhua machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus AT linchingheng machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus AT chousengcho machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus AT chentzershyong machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus AT chungyufang machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus AT yanghwaii machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus AT chenyiming machinelearningapproachesforthegenomicpredictionofrheumatoidarthritisandsystemiclupuserythematosus |