Cargando…
Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples
BACKGROUND & AIMS: Liver disease carries significant healthcare burden and frequently requires a combination of blood tests, imaging, and invasive liver biopsy to diagnose. Distinguishing between inflammatory liver diseases, which may have similar clinical presentations, is particularly challeng...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9472076/ https://www.ncbi.nlm.nih.gov/pubmed/36119721 http://dx.doi.org/10.1016/j.jhepr.2022.100560 |
_version_ | 1784789228206424064 |
---|---|
author | Listopad, Stanislav Magnan, Christophe Asghar, Aliya Stolz, Andrew Tayek, John A. Liu, Zhang-Xu Morgan, Timothy R. Norden-Krichmar, Trina M. |
author_facet | Listopad, Stanislav Magnan, Christophe Asghar, Aliya Stolz, Andrew Tayek, John A. Liu, Zhang-Xu Morgan, Timothy R. Norden-Krichmar, Trina M. |
author_sort | Listopad, Stanislav |
collection | PubMed |
description | BACKGROUND & AIMS: Liver disease carries significant healthcare burden and frequently requires a combination of blood tests, imaging, and invasive liver biopsy to diagnose. Distinguishing between inflammatory liver diseases, which may have similar clinical presentations, is particularly challenging. In this study, we implemented a machine learning pipeline for the identification of diagnostic gene expression biomarkers across several alcohol-associated and non-alcohol-associated liver diseases, using either liver tissue or blood-based samples. METHODS: We collected peripheral blood mononuclear cells (PBMCs) and liver tissue samples from participants with alcohol-associated hepatitis (AH), alcohol-associated cirrhosis (AC), non-alcohol-associated fatty liver disease, chronic HCV infection, and healthy controls. We performed RNA sequencing (RNA-seq) on 137 PBMC samples and 67 liver tissue samples. Using gene expression data, we implemented a machine learning feature selection and classification pipeline to identify diagnostic biomarkers which distinguish between the liver disease groups. The liver tissue results were validated using a public independent RNA-seq dataset. The biomarkers were computationally validated for biological relevance using pathway analysis tools. RESULTS: Utilizing liver tissue RNA-seq data, we distinguished between AH, AC, and healthy conditions with overall accuracies of 90% in our dataset, and 82% in the independent dataset, with 33 genes. Distinguishing 4 liver conditions and healthy controls yielded 91% overall accuracy in our liver tissue dataset with 39 genes, and 75% overall accuracy in our PBMC dataset with 75 genes. CONCLUSIONS: Our machine learning pipeline was effective at identifying a small set of diagnostic gene biomarkers and classifying several liver diseases using RNA-seq data from liver tissue and PBMCs. The methodologies implemented and genes identified in this study may facilitate future efforts toward a liquid biopsy diagnostic for liver diseases. LAY SUMMARY: Distinguishing between inflammatory liver diseases without multiple tests can be challenging due to their clinically similar characteristics. To lay the groundwork for the development of a non-invasive blood-based diagnostic across a range of liver diseases, we compared samples from participants with alcohol-associated hepatitis, alcohol-associated cirrhosis, chronic hepatitis C infection, and non-alcohol-associated fatty liver disease. We used a machine learning computational approach to demonstrate that gene expression data generated from either liver tissue or blood samples can be used to discover a small set of gene biomarkers for effective diagnosis of these liver diseases. |
format | Online Article Text |
id | pubmed-9472076 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-94720762022-09-15 Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples Listopad, Stanislav Magnan, Christophe Asghar, Aliya Stolz, Andrew Tayek, John A. Liu, Zhang-Xu Morgan, Timothy R. Norden-Krichmar, Trina M. JHEP Rep Research Article BACKGROUND & AIMS: Liver disease carries significant healthcare burden and frequently requires a combination of blood tests, imaging, and invasive liver biopsy to diagnose. Distinguishing between inflammatory liver diseases, which may have similar clinical presentations, is particularly challenging. In this study, we implemented a machine learning pipeline for the identification of diagnostic gene expression biomarkers across several alcohol-associated and non-alcohol-associated liver diseases, using either liver tissue or blood-based samples. METHODS: We collected peripheral blood mononuclear cells (PBMCs) and liver tissue samples from participants with alcohol-associated hepatitis (AH), alcohol-associated cirrhosis (AC), non-alcohol-associated fatty liver disease, chronic HCV infection, and healthy controls. We performed RNA sequencing (RNA-seq) on 137 PBMC samples and 67 liver tissue samples. Using gene expression data, we implemented a machine learning feature selection and classification pipeline to identify diagnostic biomarkers which distinguish between the liver disease groups. The liver tissue results were validated using a public independent RNA-seq dataset. The biomarkers were computationally validated for biological relevance using pathway analysis tools. RESULTS: Utilizing liver tissue RNA-seq data, we distinguished between AH, AC, and healthy conditions with overall accuracies of 90% in our dataset, and 82% in the independent dataset, with 33 genes. Distinguishing 4 liver conditions and healthy controls yielded 91% overall accuracy in our liver tissue dataset with 39 genes, and 75% overall accuracy in our PBMC dataset with 75 genes. CONCLUSIONS: Our machine learning pipeline was effective at identifying a small set of diagnostic gene biomarkers and classifying several liver diseases using RNA-seq data from liver tissue and PBMCs. The methodologies implemented and genes identified in this study may facilitate future efforts toward a liquid biopsy diagnostic for liver diseases. LAY SUMMARY: Distinguishing between inflammatory liver diseases without multiple tests can be challenging due to their clinically similar characteristics. To lay the groundwork for the development of a non-invasive blood-based diagnostic across a range of liver diseases, we compared samples from participants with alcohol-associated hepatitis, alcohol-associated cirrhosis, chronic hepatitis C infection, and non-alcohol-associated fatty liver disease. We used a machine learning computational approach to demonstrate that gene expression data generated from either liver tissue or blood samples can be used to discover a small set of gene biomarkers for effective diagnosis of these liver diseases. Elsevier 2022-08-18 /pmc/articles/PMC9472076/ /pubmed/36119721 http://dx.doi.org/10.1016/j.jhepr.2022.100560 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Research Article Listopad, Stanislav Magnan, Christophe Asghar, Aliya Stolz, Andrew Tayek, John A. Liu, Zhang-Xu Morgan, Timothy R. Norden-Krichmar, Trina M. Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples |
title | Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples |
title_full | Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples |
title_fullStr | Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples |
title_full_unstemmed | Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples |
title_short | Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples |
title_sort | differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9472076/ https://www.ncbi.nlm.nih.gov/pubmed/36119721 http://dx.doi.org/10.1016/j.jhepr.2022.100560 |
work_keys_str_mv | AT listopadstanislav differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples AT magnanchristophe differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples AT asgharaliya differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples AT stolzandrew differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples AT tayekjohna differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples AT liuzhangxu differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples AT morgantimothyr differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples AT nordenkrichmartrinam differentiatingbetweenliverdiseasesbyapplyingmulticlassmachinelearningapproachestotranscriptomicsoflivertissueorbloodbasedsamples |