Cargando…

An explainable machine learning-driven proposal of pulmonary fibrosis biomarkers

Pulmonary fibrosing diseases are in the very epicenter of biomedical research both due to their increasing prevalence and their association with SARS-CoV-2 infections. Research of idiopathic pulmonary fibrosis, the most lethal among the interstitial lung diseases, is in need for new biomarkers and p...

Descripción completa

Detalles Bibliográficos
Autores principales: Fanidis, Dionysios, Pezoulas, Vasileios C., Fotiadis, Dimitrios Ι., Aidinis, Vassilis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10049879/
https://www.ncbi.nlm.nih.gov/pubmed/37007651
http://dx.doi.org/10.1016/j.csbj.2023.03.043
_version_ 1785014554279804928
author Fanidis, Dionysios
Pezoulas, Vasileios C.
Fotiadis, Dimitrios Ι.
Aidinis, Vassilis
author_facet Fanidis, Dionysios
Pezoulas, Vasileios C.
Fotiadis, Dimitrios Ι.
Aidinis, Vassilis
author_sort Fanidis, Dionysios
collection PubMed
description Pulmonary fibrosing diseases are in the very epicenter of biomedical research both due to their increasing prevalence and their association with SARS-CoV-2 infections. Research of idiopathic pulmonary fibrosis, the most lethal among the interstitial lung diseases, is in need for new biomarkers and potential disease targets, a goal that could be accelerated using machine learning techniques. In this study, we have used Shapley values to explain the decisions made by an ensemble learning model trained to classify samples to an either pulmonary fibrosis or steady state based on the expression values of deregulated genes. This process resulted in a full and a laconic set of features capable of separating phenotypes to an at least equal degree as previously published marker sets. Indicatively, a maximum increase of 6% in specificity and 5% in Mathew’s correlation coefficient was achieved. Evaluation with an additional independent dataset showed our feature set having a greater generalization potential than the rest. Ultimately, the proposed gene lists are expected not only to serve as new sets of diagnostic marker elements, but also as a target pool for future research initiatives.
format Online
Article
Text
id pubmed-10049879
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-100498792023-03-29 An explainable machine learning-driven proposal of pulmonary fibrosis biomarkers Fanidis, Dionysios Pezoulas, Vasileios C. Fotiadis, Dimitrios Ι. Aidinis, Vassilis Comput Struct Biotechnol J Research Article Pulmonary fibrosing diseases are in the very epicenter of biomedical research both due to their increasing prevalence and their association with SARS-CoV-2 infections. Research of idiopathic pulmonary fibrosis, the most lethal among the interstitial lung diseases, is in need for new biomarkers and potential disease targets, a goal that could be accelerated using machine learning techniques. In this study, we have used Shapley values to explain the decisions made by an ensemble learning model trained to classify samples to an either pulmonary fibrosis or steady state based on the expression values of deregulated genes. This process resulted in a full and a laconic set of features capable of separating phenotypes to an at least equal degree as previously published marker sets. Indicatively, a maximum increase of 6% in specificity and 5% in Mathew’s correlation coefficient was achieved. Evaluation with an additional independent dataset showed our feature set having a greater generalization potential than the rest. Ultimately, the proposed gene lists are expected not only to serve as new sets of diagnostic marker elements, but also as a target pool for future research initiatives. Research Network of Computational and Structural Biotechnology 2023-03-25 /pmc/articles/PMC10049879/ /pubmed/37007651 http://dx.doi.org/10.1016/j.csbj.2023.03.043 Text en © 2023 The Authors
spellingShingle Research Article
Fanidis, Dionysios
Pezoulas, Vasileios C.
Fotiadis, Dimitrios Ι.
Aidinis, Vassilis
An explainable machine learning-driven proposal of pulmonary fibrosis biomarkers
title An explainable machine learning-driven proposal of pulmonary fibrosis biomarkers
title_full An explainable machine learning-driven proposal of pulmonary fibrosis biomarkers
title_fullStr An explainable machine learning-driven proposal of pulmonary fibrosis biomarkers
title_full_unstemmed An explainable machine learning-driven proposal of pulmonary fibrosis biomarkers
title_short An explainable machine learning-driven proposal of pulmonary fibrosis biomarkers
title_sort explainable machine learning-driven proposal of pulmonary fibrosis biomarkers
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10049879/
https://www.ncbi.nlm.nih.gov/pubmed/37007651
http://dx.doi.org/10.1016/j.csbj.2023.03.043
work_keys_str_mv AT fanidisdionysios anexplainablemachinelearningdrivenproposalofpulmonaryfibrosisbiomarkers
AT pezoulasvasileiosc anexplainablemachinelearningdrivenproposalofpulmonaryfibrosisbiomarkers
AT fotiadisdimitriosi anexplainablemachinelearningdrivenproposalofpulmonaryfibrosisbiomarkers
AT aidinisvassilis anexplainablemachinelearningdrivenproposalofpulmonaryfibrosisbiomarkers
AT fanidisdionysios explainablemachinelearningdrivenproposalofpulmonaryfibrosisbiomarkers
AT pezoulasvasileiosc explainablemachinelearningdrivenproposalofpulmonaryfibrosisbiomarkers
AT fotiadisdimitriosi explainablemachinelearningdrivenproposalofpulmonaryfibrosisbiomarkers
AT aidinisvassilis explainablemachinelearningdrivenproposalofpulmonaryfibrosisbiomarkers