Cargando…

Identification of gene signature in RNA-Seq hepatocellular carcinoma data by Pareto-optimal cluster algorithm

AIM: This study aimed to detect gene signatures in RNA-sequencing (RNA-seq) data using Pareto-optimal cluster size identification. BACKGROUND: RNA-seq has emerged as an important technology for transcriptome profiling in recent years. Gene expression signatures involving tens of genes have been prov...

Descripción completa

Detalles Bibliográficos
Autores principales: Kenarangi, Taiebe, Bakhshi, Enayatolah, InanlooRahatloo, Kolsoum, Biglarian, Akbar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Shaheed Beheshti University of Medical Sciences 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9876762/
https://www.ncbi.nlm.nih.gov/pubmed/36762216
http://dx.doi.org/10.22037/ghfbb.v15i4.2488
_version_ 1784878234882539520
author Kenarangi, Taiebe
Bakhshi, Enayatolah
InanlooRahatloo, Kolsoum
Biglarian, Akbar
author_facet Kenarangi, Taiebe
Bakhshi, Enayatolah
InanlooRahatloo, Kolsoum
Biglarian, Akbar
author_sort Kenarangi, Taiebe
collection PubMed
description AIM: This study aimed to detect gene signatures in RNA-sequencing (RNA-seq) data using Pareto-optimal cluster size identification. BACKGROUND: RNA-seq has emerged as an important technology for transcriptome profiling in recent years. Gene expression signatures involving tens of genes have been proven to be predictive of disease type and patient response to treatment. METHODS: Data related to the liver cancer RNA-seq dataset, which included 35 paired hepatocellular carcinoma (HCC) and non-tumor tissue samples, was used in this study. The differentially expressed genes (DEGs) were identified after performing pre-filtering and normalization. After that, a multi-objective optimization technique, namely multi-objective optimization for collecting cluster alternatives (MOCCA), was used to discover the Pareto-optimal cluster size for these DEGs. Then, the k-means clustering method was performed on the RNA-seq data. The best cluster, as a signature for the disease, was found by calculating the average Spearman's correlation score of all genes in the module in a pair-wise manner. All analyses were performed in the R 4.1.1 package in virtual space with 100 Gb of RAM memory. RESULTS: Using MOCCA, eight Pareto-optimal clusters were obtained. Ultimately, two clusters with the greatest average Spearman's correlation coefficient scores were chosen as gene signatures. Eleven prognostic genes involved in HCC's abnormal metabolism were identified. In addition, three differentially expressed pathways were identified between tumor and non-tumor tissues. CONCLUSION: These identified metabolic prognostic genes help us to provide more powerful prognostic information and enhance survival prediction for HCC patients. In addition, Pareto-optimal cluster size identification is suggested for gene signature in other RNA-Seq data.
format Online
Article
Text
id pubmed-9876762
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Shaheed Beheshti University of Medical Sciences
record_format MEDLINE/PubMed
spelling pubmed-98767622023-02-08 Identification of gene signature in RNA-Seq hepatocellular carcinoma data by Pareto-optimal cluster algorithm Kenarangi, Taiebe Bakhshi, Enayatolah InanlooRahatloo, Kolsoum Biglarian, Akbar Gastroenterol Hepatol Bed Bench Original Article AIM: This study aimed to detect gene signatures in RNA-sequencing (RNA-seq) data using Pareto-optimal cluster size identification. BACKGROUND: RNA-seq has emerged as an important technology for transcriptome profiling in recent years. Gene expression signatures involving tens of genes have been proven to be predictive of disease type and patient response to treatment. METHODS: Data related to the liver cancer RNA-seq dataset, which included 35 paired hepatocellular carcinoma (HCC) and non-tumor tissue samples, was used in this study. The differentially expressed genes (DEGs) were identified after performing pre-filtering and normalization. After that, a multi-objective optimization technique, namely multi-objective optimization for collecting cluster alternatives (MOCCA), was used to discover the Pareto-optimal cluster size for these DEGs. Then, the k-means clustering method was performed on the RNA-seq data. The best cluster, as a signature for the disease, was found by calculating the average Spearman's correlation score of all genes in the module in a pair-wise manner. All analyses were performed in the R 4.1.1 package in virtual space with 100 Gb of RAM memory. RESULTS: Using MOCCA, eight Pareto-optimal clusters were obtained. Ultimately, two clusters with the greatest average Spearman's correlation coefficient scores were chosen as gene signatures. Eleven prognostic genes involved in HCC's abnormal metabolism were identified. In addition, three differentially expressed pathways were identified between tumor and non-tumor tissues. CONCLUSION: These identified metabolic prognostic genes help us to provide more powerful prognostic information and enhance survival prediction for HCC patients. In addition, Pareto-optimal cluster size identification is suggested for gene signature in other RNA-Seq data. Shaheed Beheshti University of Medical Sciences 2022 /pmc/articles/PMC9876762/ /pubmed/36762216 http://dx.doi.org/10.22037/ghfbb.v15i4.2488 Text en https://creativecommons.org/licenses/by-nc/4.0/This is an open-access article, distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) which permits others to copy and redistribute the material just in noncommercial usages, provided the original work is properly cited.
spellingShingle Original Article
Kenarangi, Taiebe
Bakhshi, Enayatolah
InanlooRahatloo, Kolsoum
Biglarian, Akbar
Identification of gene signature in RNA-Seq hepatocellular carcinoma data by Pareto-optimal cluster algorithm
title Identification of gene signature in RNA-Seq hepatocellular carcinoma data by Pareto-optimal cluster algorithm
title_full Identification of gene signature in RNA-Seq hepatocellular carcinoma data by Pareto-optimal cluster algorithm
title_fullStr Identification of gene signature in RNA-Seq hepatocellular carcinoma data by Pareto-optimal cluster algorithm
title_full_unstemmed Identification of gene signature in RNA-Seq hepatocellular carcinoma data by Pareto-optimal cluster algorithm
title_short Identification of gene signature in RNA-Seq hepatocellular carcinoma data by Pareto-optimal cluster algorithm
title_sort identification of gene signature in rna-seq hepatocellular carcinoma data by pareto-optimal cluster algorithm
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9876762/
https://www.ncbi.nlm.nih.gov/pubmed/36762216
http://dx.doi.org/10.22037/ghfbb.v15i4.2488
work_keys_str_mv AT kenarangitaiebe identificationofgenesignatureinrnaseqhepatocellularcarcinomadatabyparetooptimalclusteralgorithm
AT bakhshienayatolah identificationofgenesignatureinrnaseqhepatocellularcarcinomadatabyparetooptimalclusteralgorithm
AT inanloorahatlookolsoum identificationofgenesignatureinrnaseqhepatocellularcarcinomadatabyparetooptimalclusteralgorithm
AT biglarianakbar identificationofgenesignatureinrnaseqhepatocellularcarcinomadatabyparetooptimalclusteralgorithm