Cargando…

Explainable artificial intelligence model for identifying COVID-19 gene biomarkers

AIM: COVID-19 has revealed the need for fast and reliable methods to assist clinicians in diagnosing the disease. This article presents a model that applies explainable artificial intelligence (XAI) methods based on machine learning techniques on COVID-19 metagenomic next-generation sequencing (mNGS...

Descripción completa

Detalles Bibliográficos
Autores principales: Yagin, Fatma Hilal, Cicek, İpek Balikci, Alkhateeb, Abedalrhman, Yagin, Burak, Colak, Cemil, Azzeh, Mohammad, Akbulut, Sami
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Ltd. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9889119/
https://www.ncbi.nlm.nih.gov/pubmed/36738712
http://dx.doi.org/10.1016/j.compbiomed.2023.106619
_version_ 1784880664677449728
author Yagin, Fatma Hilal
Cicek, İpek Balikci
Alkhateeb, Abedalrhman
Yagin, Burak
Colak, Cemil
Azzeh, Mohammad
Akbulut, Sami
author_facet Yagin, Fatma Hilal
Cicek, İpek Balikci
Alkhateeb, Abedalrhman
Yagin, Burak
Colak, Cemil
Azzeh, Mohammad
Akbulut, Sami
author_sort Yagin, Fatma Hilal
collection PubMed
description AIM: COVID-19 has revealed the need for fast and reliable methods to assist clinicians in diagnosing the disease. This article presents a model that applies explainable artificial intelligence (XAI) methods based on machine learning techniques on COVID-19 metagenomic next-generation sequencing (mNGS) samples. METHODS: In the data set used in the study, there are 15,979 gene expressions of 234 patients with COVID-19 negative 141 (60.3%) and COVID-19 positive 93 (39.7%). The least absolute shrinkage and selection operator (LASSO) method was applied to select genes associated with COVID-19. Support Vector Machine - Synthetic Minority Oversampling Technique (SVM-SMOTE) method was used to handle the class imbalance problem. Logistics regression (LR), SVM, random forest (RF), and extreme gradient boosting (XGBoost) methods were constructed to predict COVID-19. An explainable approach based on local interpretable model-agnostic explanations (LIME) and SHAPley Additive exPlanations (SHAP) methods was applied to determine COVID-19- associated biomarker candidate genes and improve the final model's interpretability. RESULTS: For the diagnosis of COVID-19, the XGBoost (accuracy: 0.930) model outperformed the RF (accuracy: 0.912), SVM (accuracy: 0.877), and LR (accuracy: 0.912) models. As a result of the SHAP, the three most important genes associated with COVID-19 were IFI27, LGR6, and FAM83A. The results of LIME showed that especially the high level of IFI27 gene expression contributed to increasing the probability of positive class. CONCLUSIONS: The proposed model (XGBoost) was able to predict COVID-19 successfully. The results show that machine learning combined with LIME and SHAP can explain the biomarker prediction for COVID-19 and provide clinicians with an intuitive understanding and interpretability of the impact of risk factors in the model.
format Online
Article
Text
id pubmed-9889119
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier Ltd.
record_format MEDLINE/PubMed
spelling pubmed-98891192023-02-01 Explainable artificial intelligence model for identifying COVID-19 gene biomarkers Yagin, Fatma Hilal Cicek, İpek Balikci Alkhateeb, Abedalrhman Yagin, Burak Colak, Cemil Azzeh, Mohammad Akbulut, Sami Comput Biol Med Article AIM: COVID-19 has revealed the need for fast and reliable methods to assist clinicians in diagnosing the disease. This article presents a model that applies explainable artificial intelligence (XAI) methods based on machine learning techniques on COVID-19 metagenomic next-generation sequencing (mNGS) samples. METHODS: In the data set used in the study, there are 15,979 gene expressions of 234 patients with COVID-19 negative 141 (60.3%) and COVID-19 positive 93 (39.7%). The least absolute shrinkage and selection operator (LASSO) method was applied to select genes associated with COVID-19. Support Vector Machine - Synthetic Minority Oversampling Technique (SVM-SMOTE) method was used to handle the class imbalance problem. Logistics regression (LR), SVM, random forest (RF), and extreme gradient boosting (XGBoost) methods were constructed to predict COVID-19. An explainable approach based on local interpretable model-agnostic explanations (LIME) and SHAPley Additive exPlanations (SHAP) methods was applied to determine COVID-19- associated biomarker candidate genes and improve the final model's interpretability. RESULTS: For the diagnosis of COVID-19, the XGBoost (accuracy: 0.930) model outperformed the RF (accuracy: 0.912), SVM (accuracy: 0.877), and LR (accuracy: 0.912) models. As a result of the SHAP, the three most important genes associated with COVID-19 were IFI27, LGR6, and FAM83A. The results of LIME showed that especially the high level of IFI27 gene expression contributed to increasing the probability of positive class. CONCLUSIONS: The proposed model (XGBoost) was able to predict COVID-19 successfully. The results show that machine learning combined with LIME and SHAP can explain the biomarker prediction for COVID-19 and provide clinicians with an intuitive understanding and interpretability of the impact of risk factors in the model. Elsevier Ltd. 2023-03 2023-02-01 /pmc/articles/PMC9889119/ /pubmed/36738712 http://dx.doi.org/10.1016/j.compbiomed.2023.106619 Text en © 2023 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Yagin, Fatma Hilal
Cicek, İpek Balikci
Alkhateeb, Abedalrhman
Yagin, Burak
Colak, Cemil
Azzeh, Mohammad
Akbulut, Sami
Explainable artificial intelligence model for identifying COVID-19 gene biomarkers
title Explainable artificial intelligence model for identifying COVID-19 gene biomarkers
title_full Explainable artificial intelligence model for identifying COVID-19 gene biomarkers
title_fullStr Explainable artificial intelligence model for identifying COVID-19 gene biomarkers
title_full_unstemmed Explainable artificial intelligence model for identifying COVID-19 gene biomarkers
title_short Explainable artificial intelligence model for identifying COVID-19 gene biomarkers
title_sort explainable artificial intelligence model for identifying covid-19 gene biomarkers
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9889119/
https://www.ncbi.nlm.nih.gov/pubmed/36738712
http://dx.doi.org/10.1016/j.compbiomed.2023.106619
work_keys_str_mv AT yaginfatmahilal explainableartificialintelligencemodelforidentifyingcovid19genebiomarkers
AT cicekipekbalikci explainableartificialintelligencemodelforidentifyingcovid19genebiomarkers
AT alkhateebabedalrhman explainableartificialintelligencemodelforidentifyingcovid19genebiomarkers
AT yaginburak explainableartificialintelligencemodelforidentifyingcovid19genebiomarkers
AT colakcemil explainableartificialintelligencemodelforidentifyingcovid19genebiomarkers
AT azzehmohammad explainableartificialintelligencemodelforidentifyingcovid19genebiomarkers
AT akbulutsami explainableartificialintelligencemodelforidentifyingcovid19genebiomarkers