Cargando…

Using LLMs and Explainable ML to Analyze Biomarkers at Single-Cell Level for Improved Understanding of Diseases

Single-cell RNA sequencing (scRNA-seq) technology has significantly advanced our understanding of the diversity of cells and how this diversity is implicated in diseases. Yet, translating these findings across various scRNA-seq datasets poses challenges due to technical variability and dataset-speci...

Descripción completa

Detalles Bibliográficos
Autores principales: Elsborg, Jonas, Salvatore, Marco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10605495/
https://www.ncbi.nlm.nih.gov/pubmed/37892198
http://dx.doi.org/10.3390/biom13101516
_version_ 1785127088134553600
author Elsborg, Jonas
Salvatore, Marco
author_facet Elsborg, Jonas
Salvatore, Marco
author_sort Elsborg, Jonas
collection PubMed
description Single-cell RNA sequencing (scRNA-seq) technology has significantly advanced our understanding of the diversity of cells and how this diversity is implicated in diseases. Yet, translating these findings across various scRNA-seq datasets poses challenges due to technical variability and dataset-specific biases. To overcome this, we present a novel approach that employs both an LLM-based framework and explainable machine learning to facilitate generalization across single-cell datasets and identify gene signatures to capture disease-driven transcriptional changes. Our approach uses scBERT, which harnesses shared transcriptomic features among cell types to establish consistent cell-type annotations across multiple scRNA-seq datasets. Additionally, we employed a symbolic regression algorithm to pinpoint highly relevant, yet minimally redundant models and features for inferring a cell type’s disease state based on its transcriptomic profile. We ascertained the versatility of these cell-specific gene signatures across datasets, showcasing their resilience as molecular markers to pinpoint and characterize disease-associated cell types. The validation was carried out using four publicly available scRNA-seq datasets from both healthy individuals and those suffering from ulcerative colitis (UC). This demonstrates our approach’s efficacy in bridging disparities specific to different datasets, fostering comparative analyses. Notably, the simplicity and symbolic nature of the retrieved gene signatures facilitate their interpretability, allowing us to elucidate underlying molecular disease mechanisms using these models.
format Online
Article
Text
id pubmed-10605495
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106054952023-10-28 Using LLMs and Explainable ML to Analyze Biomarkers at Single-Cell Level for Improved Understanding of Diseases Elsborg, Jonas Salvatore, Marco Biomolecules Article Single-cell RNA sequencing (scRNA-seq) technology has significantly advanced our understanding of the diversity of cells and how this diversity is implicated in diseases. Yet, translating these findings across various scRNA-seq datasets poses challenges due to technical variability and dataset-specific biases. To overcome this, we present a novel approach that employs both an LLM-based framework and explainable machine learning to facilitate generalization across single-cell datasets and identify gene signatures to capture disease-driven transcriptional changes. Our approach uses scBERT, which harnesses shared transcriptomic features among cell types to establish consistent cell-type annotations across multiple scRNA-seq datasets. Additionally, we employed a symbolic regression algorithm to pinpoint highly relevant, yet minimally redundant models and features for inferring a cell type’s disease state based on its transcriptomic profile. We ascertained the versatility of these cell-specific gene signatures across datasets, showcasing their resilience as molecular markers to pinpoint and characterize disease-associated cell types. The validation was carried out using four publicly available scRNA-seq datasets from both healthy individuals and those suffering from ulcerative colitis (UC). This demonstrates our approach’s efficacy in bridging disparities specific to different datasets, fostering comparative analyses. Notably, the simplicity and symbolic nature of the retrieved gene signatures facilitate their interpretability, allowing us to elucidate underlying molecular disease mechanisms using these models. MDPI 2023-10-12 /pmc/articles/PMC10605495/ /pubmed/37892198 http://dx.doi.org/10.3390/biom13101516 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Elsborg, Jonas
Salvatore, Marco
Using LLMs and Explainable ML to Analyze Biomarkers at Single-Cell Level for Improved Understanding of Diseases
title Using LLMs and Explainable ML to Analyze Biomarkers at Single-Cell Level for Improved Understanding of Diseases
title_full Using LLMs and Explainable ML to Analyze Biomarkers at Single-Cell Level for Improved Understanding of Diseases
title_fullStr Using LLMs and Explainable ML to Analyze Biomarkers at Single-Cell Level for Improved Understanding of Diseases
title_full_unstemmed Using LLMs and Explainable ML to Analyze Biomarkers at Single-Cell Level for Improved Understanding of Diseases
title_short Using LLMs and Explainable ML to Analyze Biomarkers at Single-Cell Level for Improved Understanding of Diseases
title_sort using llms and explainable ml to analyze biomarkers at single-cell level for improved understanding of diseases
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10605495/
https://www.ncbi.nlm.nih.gov/pubmed/37892198
http://dx.doi.org/10.3390/biom13101516
work_keys_str_mv AT elsborgjonas usingllmsandexplainablemltoanalyzebiomarkersatsinglecelllevelforimprovedunderstandingofdiseases
AT salvatoremarco usingllmsandexplainablemltoanalyzebiomarkersatsinglecelllevelforimprovedunderstandingofdiseases