Cargando…

Predicting liver cancer on epigenomics data using machine learning

Epigenomics is the branch of biology concerned with the phenotype modifications that do not induce any change in the cell DNA sequence. Epigenetic modifications apply changes to the properties of DNA, which ultimately prevents such DNA actions from being executed. These alterations arise in the canc...

Descripción completa

Detalles Bibliográficos
Autores principales: Vekariya, Vishalkumar, Passi, Kalpdrum, Jain, Chakresh Kumar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580905/
https://www.ncbi.nlm.nih.gov/pubmed/36304318
http://dx.doi.org/10.3389/fbinf.2022.954529
_version_ 1784812497848500224
author Vekariya, Vishalkumar
Passi, Kalpdrum
Jain, Chakresh Kumar
author_facet Vekariya, Vishalkumar
Passi, Kalpdrum
Jain, Chakresh Kumar
author_sort Vekariya, Vishalkumar
collection PubMed
description Epigenomics is the branch of biology concerned with the phenotype modifications that do not induce any change in the cell DNA sequence. Epigenetic modifications apply changes to the properties of DNA, which ultimately prevents such DNA actions from being executed. These alterations arise in the cancer cells, which is the only cause of cancer. The liver is the metabolic cleansing center of the human body and the only organ, which can regenerate itself, but liver cancer can stop the cleansing of the body. Machine learning techniques are used in this research to predict the gene expression of the liver cells for the liver hepatocellular carcinoma (LIHC), which is the third biggest reason of death by cancer and affects five hundred thousand people per year. The data for LIHC include four different types, namely, methylation, histone, the human genome, and RNA sequences. The data were accessed through open-source technologies in R programming languages for The Cancer Genome Atlas (TCGA). The proposed method considers 1,000 features across the four types of data. Nine different feature selection methods were used and eight different classification methods were compared to select the best model over 5-fold cross-validation and different training-to-test ratios. The best model was obtained for 140 features for ReliefF feature selection and XGBoost classification method with an AUC of 1.0 and an accuracy of 99.67% to predict the liver cancer.
format Online
Article
Text
id pubmed-9580905
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-95809052022-10-26 Predicting liver cancer on epigenomics data using machine learning Vekariya, Vishalkumar Passi, Kalpdrum Jain, Chakresh Kumar Front Bioinform Bioinformatics Epigenomics is the branch of biology concerned with the phenotype modifications that do not induce any change in the cell DNA sequence. Epigenetic modifications apply changes to the properties of DNA, which ultimately prevents such DNA actions from being executed. These alterations arise in the cancer cells, which is the only cause of cancer. The liver is the metabolic cleansing center of the human body and the only organ, which can regenerate itself, but liver cancer can stop the cleansing of the body. Machine learning techniques are used in this research to predict the gene expression of the liver cells for the liver hepatocellular carcinoma (LIHC), which is the third biggest reason of death by cancer and affects five hundred thousand people per year. The data for LIHC include four different types, namely, methylation, histone, the human genome, and RNA sequences. The data were accessed through open-source technologies in R programming languages for The Cancer Genome Atlas (TCGA). The proposed method considers 1,000 features across the four types of data. Nine different feature selection methods were used and eight different classification methods were compared to select the best model over 5-fold cross-validation and different training-to-test ratios. The best model was obtained for 140 features for ReliefF feature selection and XGBoost classification method with an AUC of 1.0 and an accuracy of 99.67% to predict the liver cancer. Frontiers Media S.A. 2022-09-27 /pmc/articles/PMC9580905/ /pubmed/36304318 http://dx.doi.org/10.3389/fbinf.2022.954529 Text en Copyright © 2022 Vekariya, Passi and Jain. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioinformatics
Vekariya, Vishalkumar
Passi, Kalpdrum
Jain, Chakresh Kumar
Predicting liver cancer on epigenomics data using machine learning
title Predicting liver cancer on epigenomics data using machine learning
title_full Predicting liver cancer on epigenomics data using machine learning
title_fullStr Predicting liver cancer on epigenomics data using machine learning
title_full_unstemmed Predicting liver cancer on epigenomics data using machine learning
title_short Predicting liver cancer on epigenomics data using machine learning
title_sort predicting liver cancer on epigenomics data using machine learning
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580905/
https://www.ncbi.nlm.nih.gov/pubmed/36304318
http://dx.doi.org/10.3389/fbinf.2022.954529
work_keys_str_mv AT vekariyavishalkumar predictinglivercanceronepigenomicsdatausingmachinelearning
AT passikalpdrum predictinglivercanceronepigenomicsdatausingmachinelearning
AT jainchakreshkumar predictinglivercanceronepigenomicsdatausingmachinelearning