Cargando…
A hybrid metaheuristic-deep learning technique for the pan-classification of cancer based on DNA methylation
BACKGROUND: DNA Methylation is one of the most important epigenetic processes that are crucial to regulating the functioning of the human genome without altering the DNA sequence. DNA Methylation data for cancer patients are becoming more accessible than ever, which is attributed to newer DNA sequen...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9275179/ https://www.ncbi.nlm.nih.gov/pubmed/35818034 http://dx.doi.org/10.1186/s12859-022-04815-7 |
_version_ | 1784745438769840128 |
---|---|
author | Eissa, Noureldin S. Khairuddin, Uswah Yusof, Rubiyah |
author_facet | Eissa, Noureldin S. Khairuddin, Uswah Yusof, Rubiyah |
author_sort | Eissa, Noureldin S. |
collection | PubMed |
description | BACKGROUND: DNA Methylation is one of the most important epigenetic processes that are crucial to regulating the functioning of the human genome without altering the DNA sequence. DNA Methylation data for cancer patients are becoming more accessible than ever, which is attributed to newer DNA sequencing technologies, notably, the relatively low-cost DNA microarray technology by Illumina Infinium. This technology makes it possible to study DNA methylation at hundreds of thousands of different loci. Currently, most of the research found in the literature focuses on the discovery of DNA methylation markers for specific cancer types. A relatively small number of studies have attempted to find unified DNA methylation biomarkers that can diagnose different types of cancer (pan-cancer classification). RESULTS: In this study, the aim is to conduct a pan-classification of cancer disease. We retrieved individual data for different types of cancer patients from The Cancer Genome Atlas (TCGA) portal. We selected data for many cancer types: Breast Cancer (BRCA), Ovary Cancer (OV), Stomach Cancer (STOMACH), Colon Cancer (COAD), Kidney Cancer (KIRC), Liver Cancer (LIHC), Lung Cancer (LUSC), Prostate Cancer (PRAD) and Thyroid cancer (THCA). The data was pre-processed and later used to build the required dataset. The system that we developed consists of two main stages. The purpose of the first stage is to perform feature selection and, therefore, decrease the dimensionality of the DNA methylation loci (features). This is accomplished using an unsupervised metaheuristic technique. As for the second stage, we used supervised machine learning and developed deep neural network (DNN) models to help classify the samples’ malignancy status and cancer type. Experimental results showed that compared to recently published methods, our proposed system achieved better classification results in terms of recall, and similar and higher results in terms of precision and accuracy. The proposed system also achieved an excellent receiver operating characteristic area under the curve (ROC AUC) values varying from 0.85 to 0.89. CONCLUSIONS: This research presented an effective new approach to classify different cancer types based on DNA methylation data retrieved from TCGA. The performance of the proposed system was compared to recently published works, using different performance metrics. It provided better results, confirming the effectiveness of the proposed method for classifying different cancer types based on DNA methylation data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04815-7. |
format | Online Article Text |
id | pubmed-9275179 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-92751792022-07-13 A hybrid metaheuristic-deep learning technique for the pan-classification of cancer based on DNA methylation Eissa, Noureldin S. Khairuddin, Uswah Yusof, Rubiyah BMC Bioinformatics Research BACKGROUND: DNA Methylation is one of the most important epigenetic processes that are crucial to regulating the functioning of the human genome without altering the DNA sequence. DNA Methylation data for cancer patients are becoming more accessible than ever, which is attributed to newer DNA sequencing technologies, notably, the relatively low-cost DNA microarray technology by Illumina Infinium. This technology makes it possible to study DNA methylation at hundreds of thousands of different loci. Currently, most of the research found in the literature focuses on the discovery of DNA methylation markers for specific cancer types. A relatively small number of studies have attempted to find unified DNA methylation biomarkers that can diagnose different types of cancer (pan-cancer classification). RESULTS: In this study, the aim is to conduct a pan-classification of cancer disease. We retrieved individual data for different types of cancer patients from The Cancer Genome Atlas (TCGA) portal. We selected data for many cancer types: Breast Cancer (BRCA), Ovary Cancer (OV), Stomach Cancer (STOMACH), Colon Cancer (COAD), Kidney Cancer (KIRC), Liver Cancer (LIHC), Lung Cancer (LUSC), Prostate Cancer (PRAD) and Thyroid cancer (THCA). The data was pre-processed and later used to build the required dataset. The system that we developed consists of two main stages. The purpose of the first stage is to perform feature selection and, therefore, decrease the dimensionality of the DNA methylation loci (features). This is accomplished using an unsupervised metaheuristic technique. As for the second stage, we used supervised machine learning and developed deep neural network (DNN) models to help classify the samples’ malignancy status and cancer type. Experimental results showed that compared to recently published methods, our proposed system achieved better classification results in terms of recall, and similar and higher results in terms of precision and accuracy. The proposed system also achieved an excellent receiver operating characteristic area under the curve (ROC AUC) values varying from 0.85 to 0.89. CONCLUSIONS: This research presented an effective new approach to classify different cancer types based on DNA methylation data retrieved from TCGA. The performance of the proposed system was compared to recently published works, using different performance metrics. It provided better results, confirming the effectiveness of the proposed method for classifying different cancer types based on DNA methylation data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04815-7. BioMed Central 2022-07-11 /pmc/articles/PMC9275179/ /pubmed/35818034 http://dx.doi.org/10.1186/s12859-022-04815-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Eissa, Noureldin S. Khairuddin, Uswah Yusof, Rubiyah A hybrid metaheuristic-deep learning technique for the pan-classification of cancer based on DNA methylation |
title | A hybrid metaheuristic-deep learning technique for the pan-classification of cancer based on DNA methylation |
title_full | A hybrid metaheuristic-deep learning technique for the pan-classification of cancer based on DNA methylation |
title_fullStr | A hybrid metaheuristic-deep learning technique for the pan-classification of cancer based on DNA methylation |
title_full_unstemmed | A hybrid metaheuristic-deep learning technique for the pan-classification of cancer based on DNA methylation |
title_short | A hybrid metaheuristic-deep learning technique for the pan-classification of cancer based on DNA methylation |
title_sort | hybrid metaheuristic-deep learning technique for the pan-classification of cancer based on dna methylation |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9275179/ https://www.ncbi.nlm.nih.gov/pubmed/35818034 http://dx.doi.org/10.1186/s12859-022-04815-7 |
work_keys_str_mv | AT eissanoureldins ahybridmetaheuristicdeeplearningtechniqueforthepanclassificationofcancerbasedondnamethylation AT khairuddinuswah ahybridmetaheuristicdeeplearningtechniqueforthepanclassificationofcancerbasedondnamethylation AT yusofrubiyah ahybridmetaheuristicdeeplearningtechniqueforthepanclassificationofcancerbasedondnamethylation AT eissanoureldins hybridmetaheuristicdeeplearningtechniqueforthepanclassificationofcancerbasedondnamethylation AT khairuddinuswah hybridmetaheuristicdeeplearningtechniqueforthepanclassificationofcancerbasedondnamethylation AT yusofrubiyah hybridmetaheuristicdeeplearningtechniqueforthepanclassificationofcancerbasedondnamethylation |