Cargando…

Classification of early and late stage liver hepatocellular carcinoma patients from their genomics and epigenomics profiles

BACKGROUND: Liver Hepatocellular Carcinoma (LIHC) is one of the major cancers worldwide, responsible for millions of premature deaths every year. Prediction of clinical staging is vital to implement optimal therapeutic strategy and prognostic prediction in cancer patients. However, to date, no metho...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaur, Harpreet, Bhalla, Sherry, Raghava, Gajendra P. S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6730898/
https://www.ncbi.nlm.nih.gov/pubmed/31490960
http://dx.doi.org/10.1371/journal.pone.0221476
_version_ 1783449599627755520
author Kaur, Harpreet
Bhalla, Sherry
Raghava, Gajendra P. S.
author_facet Kaur, Harpreet
Bhalla, Sherry
Raghava, Gajendra P. S.
author_sort Kaur, Harpreet
collection PubMed
description BACKGROUND: Liver Hepatocellular Carcinoma (LIHC) is one of the major cancers worldwide, responsible for millions of premature deaths every year. Prediction of clinical staging is vital to implement optimal therapeutic strategy and prognostic prediction in cancer patients. However, to date, no method has been developed for predicting the stage of LIHC from the genomic profile of samples. METHODS: The Cancer Genome Atlas (TCGA) dataset of 173 early stage (stage-I), 177 late stage (stage-II, Stage-III and stage-IV) and 50 adjacent normal tissue samples for 60,483 RNA transcripts and 485,577 methylation CpG sites, was extensively analyzed to identify the key transcriptomic expression and methylation-based features using different feature selection techniques. Further, different classification models were developed based on selected key features to categorize different classes of samples implementing different machine learning algorithms. RESULTS: In the current study, in silico models have been developed for classifying LIHC patients in the early vs. late stage and cancerous vs. normal samples using RNA expression and DNA methylation data. TCGA datasets were extensively analyzed to identify differentially expressed RNA transcripts and methylated CpG sites that can discriminate early vs. late stages and cancer vs. normal samples of LIHC with high precision. Naive Bayes model developed using 51 features that combine 21 CpG methylation sites and 30 RNA transcripts achieved maximum MCC (Matthew’s correlation coefficient) 0.58 with an accuracy of 78.87% on the validation dataset in discrimination of early and late stage. Additionally, the prediction models developed based on 5 RNA transcripts and 5 CpG sites classify LIHC and normal samples with an accuracy of 96–98% and AUC (Area Under the Receiver Operating Characteristic curve) 0.99. Besides, multiclass models also developed for classifying samples in the normal, early and late stage of cancer and achieved an accuracy of 76.54% and AUC of 0.86. CONCLUSION: Our study reveals stage prediction of LIHC samples with high accuracy based on the genomics and epigenomics profiling is a challenging task in comparison to the classification of cancerous and normal samples. Comprehensive analysis, differentially expressed RNA transcripts, methylated CpG sites in LIHC samples and prediction models are available from CancerLSP (http://webs.iiitd.edu.in/raghava/cancerlsp/).
format Online
Article
Text
id pubmed-6730898
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-67308982019-09-16 Classification of early and late stage liver hepatocellular carcinoma patients from their genomics and epigenomics profiles Kaur, Harpreet Bhalla, Sherry Raghava, Gajendra P. S. PLoS One Research Article BACKGROUND: Liver Hepatocellular Carcinoma (LIHC) is one of the major cancers worldwide, responsible for millions of premature deaths every year. Prediction of clinical staging is vital to implement optimal therapeutic strategy and prognostic prediction in cancer patients. However, to date, no method has been developed for predicting the stage of LIHC from the genomic profile of samples. METHODS: The Cancer Genome Atlas (TCGA) dataset of 173 early stage (stage-I), 177 late stage (stage-II, Stage-III and stage-IV) and 50 adjacent normal tissue samples for 60,483 RNA transcripts and 485,577 methylation CpG sites, was extensively analyzed to identify the key transcriptomic expression and methylation-based features using different feature selection techniques. Further, different classification models were developed based on selected key features to categorize different classes of samples implementing different machine learning algorithms. RESULTS: In the current study, in silico models have been developed for classifying LIHC patients in the early vs. late stage and cancerous vs. normal samples using RNA expression and DNA methylation data. TCGA datasets were extensively analyzed to identify differentially expressed RNA transcripts and methylated CpG sites that can discriminate early vs. late stages and cancer vs. normal samples of LIHC with high precision. Naive Bayes model developed using 51 features that combine 21 CpG methylation sites and 30 RNA transcripts achieved maximum MCC (Matthew’s correlation coefficient) 0.58 with an accuracy of 78.87% on the validation dataset in discrimination of early and late stage. Additionally, the prediction models developed based on 5 RNA transcripts and 5 CpG sites classify LIHC and normal samples with an accuracy of 96–98% and AUC (Area Under the Receiver Operating Characteristic curve) 0.99. Besides, multiclass models also developed for classifying samples in the normal, early and late stage of cancer and achieved an accuracy of 76.54% and AUC of 0.86. CONCLUSION: Our study reveals stage prediction of LIHC samples with high accuracy based on the genomics and epigenomics profiling is a challenging task in comparison to the classification of cancerous and normal samples. Comprehensive analysis, differentially expressed RNA transcripts, methylated CpG sites in LIHC samples and prediction models are available from CancerLSP (http://webs.iiitd.edu.in/raghava/cancerlsp/). Public Library of Science 2019-09-06 /pmc/articles/PMC6730898/ /pubmed/31490960 http://dx.doi.org/10.1371/journal.pone.0221476 Text en © 2019 Kaur et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kaur, Harpreet
Bhalla, Sherry
Raghava, Gajendra P. S.
Classification of early and late stage liver hepatocellular carcinoma patients from their genomics and epigenomics profiles
title Classification of early and late stage liver hepatocellular carcinoma patients from their genomics and epigenomics profiles
title_full Classification of early and late stage liver hepatocellular carcinoma patients from their genomics and epigenomics profiles
title_fullStr Classification of early and late stage liver hepatocellular carcinoma patients from their genomics and epigenomics profiles
title_full_unstemmed Classification of early and late stage liver hepatocellular carcinoma patients from their genomics and epigenomics profiles
title_short Classification of early and late stage liver hepatocellular carcinoma patients from their genomics and epigenomics profiles
title_sort classification of early and late stage liver hepatocellular carcinoma patients from their genomics and epigenomics profiles
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6730898/
https://www.ncbi.nlm.nih.gov/pubmed/31490960
http://dx.doi.org/10.1371/journal.pone.0221476
work_keys_str_mv AT kaurharpreet classificationofearlyandlatestageliverhepatocellularcarcinomapatientsfromtheirgenomicsandepigenomicsprofiles
AT bhallasherry classificationofearlyandlatestageliverhepatocellularcarcinomapatientsfromtheirgenomicsandepigenomicsprofiles
AT raghavagajendraps classificationofearlyandlatestageliverhepatocellularcarcinomapatientsfromtheirgenomicsandepigenomicsprofiles