Cargando…

Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples

BACKGROUND & AIMS: Liver disease carries significant healthcare burden and frequently requires a combination of blood tests, imaging, and invasive liver biopsy to diagnose. Distinguishing between inflammatory liver diseases, which may have similar clinical presentations, is particularly challeng...

Descripción completa

Detalles Bibliográficos
Autores principales: Listopad, Stanislav, Magnan, Christophe, Asghar, Aliya, Stolz, Andrew, Tayek, John A., Liu, Zhang-Xu, Morgan, Timothy R., Norden-Krichmar, Trina M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9472076/
https://www.ncbi.nlm.nih.gov/pubmed/36119721
http://dx.doi.org/10.1016/j.jhepr.2022.100560
_version_ 1784789228206424064
author Listopad, Stanislav
Magnan, Christophe
Asghar, Aliya
Stolz, Andrew
Tayek, John A.
Liu, Zhang-Xu
Morgan, Timothy R.
Norden-Krichmar, Trina M.
author_facet Listopad, Stanislav
Magnan, Christophe
Asghar, Aliya
Stolz, Andrew
Tayek, John A.
Liu, Zhang-Xu
Morgan, Timothy R.
Norden-Krichmar, Trina M.
author_sort Listopad, Stanislav
collection PubMed
description BACKGROUND & AIMS: Liver disease carries significant healthcare burden and frequently requires a combination of blood tests, imaging, and invasive liver biopsy to diagnose. Distinguishing between inflammatory liver diseases, which may have similar clinical presentations, is particularly challenging. In this study, we implemented a machine learning pipeline for the identification of diagnostic gene expression biomarkers across several alcohol-associated and non-alcohol-associated liver diseases, using either liver tissue or blood-based samples. METHODS: We collected peripheral blood mononuclear cells (PBMCs) and liver tissue samples from participants with alcohol-associated hepatitis (AH), alcohol-associated cirrhosis (AC), non-alcohol-associated fatty liver disease, chronic HCV infection, and healthy controls. We performed RNA sequencing (RNA-seq) on 137 PBMC samples and 67 liver tissue samples. Using gene expression data, we implemented a machine learning feature selection and classification pipeline to identify diagnostic biomarkers which distinguish between the liver disease groups. The liver tissue results were validated using a public independent RNA-seq dataset. The biomarkers were computationally validated for biological relevance using pathway analysis tools. RESULTS: Utilizing liver tissue RNA-seq data, we distinguished between AH, AC, and healthy conditions with overall accuracies of 90% in our dataset, and 82% in the independent dataset, with 33 genes. Distinguishing 4 liver conditions and healthy controls yielded 91% overall accuracy in our liver tissue dataset with 39 genes, and 75% overall accuracy in our PBMC dataset with 75 genes. CONCLUSIONS: Our machine learning pipeline was effective at identifying a small set of diagnostic gene biomarkers and classifying several liver diseases using RNA-seq data from liver tissue and PBMCs. The methodologies implemented and genes identified in this study may facilitate future efforts toward a liquid biopsy diagnostic for liver diseases. LAY SUMMARY: Distinguishing between inflammatory liver diseases without multiple tests can be challenging due to their clinically similar characteristics. To lay the groundwork for the development of a non-invasive blood-based diagnostic across a range of liver diseases, we compared samples from participants with alcohol-associated hepatitis, alcohol-associated cirrhosis, chronic hepatitis C infection, and non-alcohol-associated fatty liver disease. We used a machine learning computational approach to demonstrate that gene expression data generated from either liver tissue or blood samples can be used to discover a small set of gene biomarkers for effective diagnosis of these liver diseases.
format Online
Article
Text
id pubmed-9472076
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-94720762022-09-15 Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples Listopad, Stanislav Magnan, Christophe Asghar, Aliya Stolz, Andrew Tayek, John A. Liu, Zhang-Xu Morgan, Timothy R. Norden-Krichmar, Trina M. JHEP Rep Research Article BACKGROUND & AIMS: Liver disease carries significant healthcare burden and frequently requires a combination of blood tests, imaging, and invasive liver biopsy to diagnose. Distinguishing between inflammatory liver diseases, which may have similar clinical presentations, is particularly challenging. In this study, we implemented a machine learning pipeline for the identification of diagnostic gene expression biomarkers across several alcohol-associated and non-alcohol-associated liver diseases, using either liver tissue or blood-based samples. METHODS: We collected peripheral blood mononuclear cells (PBMCs) and liver tissue samples from participants with alcohol-associated hepatitis (AH), alcohol-associated cirrhosis (AC), non-alcohol-associated fatty liver disease, chronic HCV infection, and healthy controls. We performed RNA sequencing (RNA-seq) on 137 PBMC samples and 67 liver tissue samples. Using gene expression data, we implemented a machine learning feature selection and classification pipeline to identify diagnostic biomarkers which distinguish between the liver disease groups. The liver tissue results were validated using a public independent RNA-seq dataset. The biomarkers were computationally validated for biological relevance using pathway analysis tools. RESULTS: Utilizing liver tissue RNA-seq data, we distinguished between AH, AC, and healthy conditions with overall accuracies of 90% in our dataset, and 82% in the independent dataset, with 33 genes. Distinguishing 4 liver conditions and healthy controls yielded 91% overall accuracy in our liver tissue dataset with 39 genes, and 75% overall accuracy in our PBMC dataset with 75 genes. CONCLUSIONS: Our machine learning pipeline was effective at identifying a small set of diagnostic gene biomarkers and classifying several liver diseases using RNA-seq data from liver tissue and PBMCs. The methodologies implemented and genes identified in this study may facilitate future efforts toward a liquid biopsy diagnostic for liver diseases. LAY SUMMARY: Distinguishing between inflammatory liver diseases without multiple tests can be challenging due to their clinically similar characteristics. To lay the groundwork for the development of a non-invasive blood-based diagnostic across a range of liver diseases, we compared samples from participants with alcohol-associated hepatitis, alcohol-associated cirrhosis, chronic hepatitis C infection, and non-alcohol-associated fatty liver disease. We used a machine learning computational approach to demonstrate that gene expression data generated from either liver tissue or blood samples can be used to discover a small set of gene biomarkers for effective diagnosis of these liver diseases. Elsevier 2022-08-18 /pmc/articles/PMC9472076/ /pubmed/36119721 http://dx.doi.org/10.1016/j.jhepr.2022.100560 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Listopad, Stanislav
Magnan, Christophe
Asghar, Aliya
Stolz, Andrew
Tayek, John A.
Liu, Zhang-Xu
Morgan, Timothy R.
Norden-Krichmar, Trina M.
Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples
title Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples
title_full Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples
title_fullStr Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples
title_full_unstemmed Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples
title_short Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples
title_sort differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9472076/
https://www.ncbi.nlm.nih.gov/pubmed/36119721
http://dx.doi.org/10.1016/j.jhepr.2022.100560
work_keys_str_mv AT listopadstanislav differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples
AT magnanchristophe differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples
AT asgharaliya differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples
AT stolzandrew differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples
AT tayekjohna differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples
AT liuzhangxu differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples
AT morgantimothyr differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples
AT nordenkrichmartrinam differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples