Cargando…

Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence

The SARS-CoV-2 pandemic highlighted the need for software tools that could facilitate patient triage regarding potential disease severity or even death. In this article, an ensemble of Machine Learning (ML) algorithms is evaluated in terms of predicting the severity of their condition using plasma p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dimitsaki, Stella, Gavriilidis, George I., Dimitriadis, Vlasios K., Natsiavas, Pantelis
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier B.V. 2023
Materias:	Research Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9846931/ https://www.ncbi.nlm.nih.gov/pubmed/36868685 http://dx.doi.org/10.1016/j.artmed.2023.102490

_version_	1784871312380919808
author	Dimitsaki, Stella Gavriilidis, George I. Dimitriadis, Vlasios K. Natsiavas, Pantelis
author_facet	Dimitsaki, Stella Gavriilidis, George I. Dimitriadis, Vlasios K. Natsiavas, Pantelis
author_sort	Dimitsaki, Stella
collection	PubMed
description	The SARS-CoV-2 pandemic highlighted the need for software tools that could facilitate patient triage regarding potential disease severity or even death. In this article, an ensemble of Machine Learning (ML) algorithms is evaluated in terms of predicting the severity of their condition using plasma proteomics and clinical data as input. An overview of AI-based technical developments to support COVID-19 patient management is presented outlining the landscape of relevant technical developments. Based on this review, the use of an ensemble of ML algorithms that analyze clinical and biological data (i.e., plasma proteomics) of COVID-19 patients is designed and deployed to evaluate the potential use of AI for early COVID-19 patient triage. The proposed pipeline is evaluated using three publicly available datasets for training and testing. Three ML “tasks” are defined, and several algorithms are tested through a hyperparameter tuning method to identify the highest-performance models. As overfitting is one of the typical pitfalls for such approaches (mainly due to the size of the training/validation datasets), a variety of evaluation metrics are used to mitigate this risk. In the evaluation procedure, recall scores ranged from 0.6 to 0.74 and F1-score from 0.62 to 0.75. The best performance is observed via Multi-Layer Perceptron (MLP) and Support Vector Machines (SVM) algorithms. Additionally, input data (proteomics and clinical data) were ranked based on corresponding Shapley additive explanation (SHAP) values and evaluated for their prognosticated capacity and immuno-biological credence. This “interpretable” approach revealed that our ML models could discern critical COVID-19 cases predominantly based on patient's age and plasma proteins on B cell dysfunction, hyper-activation of inflammatory pathways like Toll-like receptors, and hypo-activation of developmental and immune pathways like SCF/c-Kit signaling. Finally, the herein computational workflow is corroborated in an independent dataset and MLP superiority along with the implication of the abovementioned predictive biological pathways are corroborated. Regarding limitations of the presented ML pipeline, the datasets used in this study contain less than 1000 observations and a significant number of input features hence constituting a high-dimensional low-sample (HDLS) dataset which could be sensitive to overfitting. An advantage of the proposed pipeline is that it combines biological data (plasma proteomics) with clinical-phenotypic data. Thus, in principle, the presented approach could enable patient triage in a timely fashion if used on already trained models. However, larger datasets and further systematic validation are needed to confirm the potential clinical value of this approach. The code is available on Github: https://github.com/inab-certh/Predicting-COVID-19-severity-through-interpretable-AI-analysis-of-plasma-proteomics.
format	Online Article Text
id	pubmed-9846931
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Elsevier B.V.
record_format	MEDLINE/PubMed
spelling	pubmed-98469312023-01-18 Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence Dimitsaki, Stella Gavriilidis, George I. Dimitriadis, Vlasios K. Natsiavas, Pantelis Artif Intell Med Research Paper The SARS-CoV-2 pandemic highlighted the need for software tools that could facilitate patient triage regarding potential disease severity or even death. In this article, an ensemble of Machine Learning (ML) algorithms is evaluated in terms of predicting the severity of their condition using plasma proteomics and clinical data as input. An overview of AI-based technical developments to support COVID-19 patient management is presented outlining the landscape of relevant technical developments. Based on this review, the use of an ensemble of ML algorithms that analyze clinical and biological data (i.e., plasma proteomics) of COVID-19 patients is designed and deployed to evaluate the potential use of AI for early COVID-19 patient triage. The proposed pipeline is evaluated using three publicly available datasets for training and testing. Three ML “tasks” are defined, and several algorithms are tested through a hyperparameter tuning method to identify the highest-performance models. As overfitting is one of the typical pitfalls for such approaches (mainly due to the size of the training/validation datasets), a variety of evaluation metrics are used to mitigate this risk. In the evaluation procedure, recall scores ranged from 0.6 to 0.74 and F1-score from 0.62 to 0.75. The best performance is observed via Multi-Layer Perceptron (MLP) and Support Vector Machines (SVM) algorithms. Additionally, input data (proteomics and clinical data) were ranked based on corresponding Shapley additive explanation (SHAP) values and evaluated for their prognosticated capacity and immuno-biological credence. This “interpretable” approach revealed that our ML models could discern critical COVID-19 cases predominantly based on patient's age and plasma proteins on B cell dysfunction, hyper-activation of inflammatory pathways like Toll-like receptors, and hypo-activation of developmental and immune pathways like SCF/c-Kit signaling. Finally, the herein computational workflow is corroborated in an independent dataset and MLP superiority along with the implication of the abovementioned predictive biological pathways are corroborated. Regarding limitations of the presented ML pipeline, the datasets used in this study contain less than 1000 observations and a significant number of input features hence constituting a high-dimensional low-sample (HDLS) dataset which could be sensitive to overfitting. An advantage of the proposed pipeline is that it combines biological data (plasma proteomics) with clinical-phenotypic data. Thus, in principle, the presented approach could enable patient triage in a timely fashion if used on already trained models. However, larger datasets and further systematic validation are needed to confirm the potential clinical value of this approach. The code is available on Github: https://github.com/inab-certh/Predicting-COVID-19-severity-through-interpretable-AI-analysis-of-plasma-proteomics. Elsevier B.V. 2023-03 2023-01-18 /pmc/articles/PMC9846931/ /pubmed/36868685 http://dx.doi.org/10.1016/j.artmed.2023.102490 Text en © 2023 Elsevier B.V. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle	Research Paper Dimitsaki, Stella Gavriilidis, George I. Dimitriadis, Vlasios K. Natsiavas, Pantelis Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence
title	Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence
title_full	Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence
title_fullStr	Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence
title_full_unstemmed	Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence
title_short	Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence
title_sort	benchmarking of machine learning classifiers on plasma proteomic for covid-19 severity prediction through interpretable artificial intelligence
topic	Research Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9846931/ https://www.ncbi.nlm.nih.gov/pubmed/36868685 http://dx.doi.org/10.1016/j.artmed.2023.102490
work_keys_str_mv	AT dimitsakistella benchmarkingofmachinelearningclassifiersonplasmaproteomicforcovid19severitypredictionthroughinterpretableartificialintelligence AT gavriilidisgeorgei benchmarkingofmachinelearningclassifiersonplasmaproteomicforcovid19severitypredictionthroughinterpretableartificialintelligence AT dimitriadisvlasiosk benchmarkingofmachinelearningclassifiersonplasmaproteomicforcovid19severitypredictionthroughinterpretableartificialintelligence AT natsiavaspantelis benchmarkingofmachinelearningclassifiersonplasmaproteomicforcovid19severitypredictionthroughinterpretableartificialintelligence

Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence

Ejemplares similares