Cargando…
Application of Feature Selection and Deep Learning for Cancer Prediction Using DNA Methylation Markers
DNA methylation is a process that can affect gene accessibility and therefore gene expression. In this study, a machine learning pipeline is proposed for the prediction of breast cancer and the identification of significant genes that contribute to the prediction. The current study utilized breast c...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9498757/ https://www.ncbi.nlm.nih.gov/pubmed/36140725 http://dx.doi.org/10.3390/genes13091557 |
_version_ | 1784794839398744064 |
---|---|
author | Gomes, Rahul Paul, Nijhum He, Nichol Huber, Aaron Francis Jansen, Rick J. |
author_facet | Gomes, Rahul Paul, Nijhum He, Nichol Huber, Aaron Francis Jansen, Rick J. |
author_sort | Gomes, Rahul |
collection | PubMed |
description | DNA methylation is a process that can affect gene accessibility and therefore gene expression. In this study, a machine learning pipeline is proposed for the prediction of breast cancer and the identification of significant genes that contribute to the prediction. The current study utilized breast cancer methylation data from The Cancer Genome Atlas (TCGA), specifically the TCGA-BRCA dataset. Feature engineering techniques have been utilized to reduce data volume and make deep learning scalable. A comparative analysis of the proposed approach on Illumina 27K and 450K methylation data reveals that deep learning methodologies for cancer prediction can be coupled with feature selection models to enhance prediction accuracy. Prediction using 450K methylation markers can be accomplished in less than 13 s with an accuracy of 98.75%. Of the list of 685 genes in the feature selected 27K dataset, 578 were mapped to Ensemble Gene IDs. This reduced set was significantly (FDR < 0.05) enriched in five biological processes and one molecular function. Of the list of 1572 genes in the feature selected 450K data set, 1290 were mapped to Ensemble Gene IDs. This reduced set was significantly (FDR < 0.05) enriched in 95 biological processes and 17 molecular functions. Seven oncogene/tumor suppressor genes were common between the 27K and 450K feature selected gene sets. These genes were RTN4IP1, MYO18B, ANP32A, BRF1, SETBP1, NTRK1, and IGF2R. Our bioinformatics deep learning workflow, incorporating imputation and data balancing methods, is able to identify important methylation markers related to functionally important genes in breast cancer with high accuracy compared to deep learning or statistical models alone. |
format | Online Article Text |
id | pubmed-9498757 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-94987572022-09-23 Application of Feature Selection and Deep Learning for Cancer Prediction Using DNA Methylation Markers Gomes, Rahul Paul, Nijhum He, Nichol Huber, Aaron Francis Jansen, Rick J. Genes (Basel) Article DNA methylation is a process that can affect gene accessibility and therefore gene expression. In this study, a machine learning pipeline is proposed for the prediction of breast cancer and the identification of significant genes that contribute to the prediction. The current study utilized breast cancer methylation data from The Cancer Genome Atlas (TCGA), specifically the TCGA-BRCA dataset. Feature engineering techniques have been utilized to reduce data volume and make deep learning scalable. A comparative analysis of the proposed approach on Illumina 27K and 450K methylation data reveals that deep learning methodologies for cancer prediction can be coupled with feature selection models to enhance prediction accuracy. Prediction using 450K methylation markers can be accomplished in less than 13 s with an accuracy of 98.75%. Of the list of 685 genes in the feature selected 27K dataset, 578 were mapped to Ensemble Gene IDs. This reduced set was significantly (FDR < 0.05) enriched in five biological processes and one molecular function. Of the list of 1572 genes in the feature selected 450K data set, 1290 were mapped to Ensemble Gene IDs. This reduced set was significantly (FDR < 0.05) enriched in 95 biological processes and 17 molecular functions. Seven oncogene/tumor suppressor genes were common between the 27K and 450K feature selected gene sets. These genes were RTN4IP1, MYO18B, ANP32A, BRF1, SETBP1, NTRK1, and IGF2R. Our bioinformatics deep learning workflow, incorporating imputation and data balancing methods, is able to identify important methylation markers related to functionally important genes in breast cancer with high accuracy compared to deep learning or statistical models alone. MDPI 2022-08-29 /pmc/articles/PMC9498757/ /pubmed/36140725 http://dx.doi.org/10.3390/genes13091557 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Gomes, Rahul Paul, Nijhum He, Nichol Huber, Aaron Francis Jansen, Rick J. Application of Feature Selection and Deep Learning for Cancer Prediction Using DNA Methylation Markers |
title | Application of Feature Selection and Deep Learning for Cancer Prediction Using DNA Methylation Markers |
title_full | Application of Feature Selection and Deep Learning for Cancer Prediction Using DNA Methylation Markers |
title_fullStr | Application of Feature Selection and Deep Learning for Cancer Prediction Using DNA Methylation Markers |
title_full_unstemmed | Application of Feature Selection and Deep Learning for Cancer Prediction Using DNA Methylation Markers |
title_short | Application of Feature Selection and Deep Learning for Cancer Prediction Using DNA Methylation Markers |
title_sort | application of feature selection and deep learning for cancer prediction using dna methylation markers |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9498757/ https://www.ncbi.nlm.nih.gov/pubmed/36140725 http://dx.doi.org/10.3390/genes13091557 |
work_keys_str_mv | AT gomesrahul applicationoffeatureselectionanddeeplearningforcancerpredictionusingdnamethylationmarkers AT paulnijhum applicationoffeatureselectionanddeeplearningforcancerpredictionusingdnamethylationmarkers AT henichol applicationoffeatureselectionanddeeplearningforcancerpredictionusingdnamethylationmarkers AT huberaaronfrancis applicationoffeatureselectionanddeeplearningforcancerpredictionusingdnamethylationmarkers AT jansenrickj applicationoffeatureselectionanddeeplearningforcancerpredictionusingdnamethylationmarkers |