Cargando…

A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data

DNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, s...

Descripción completa

Detalles Bibliográficos
Autores principales: Mallik, Saurav, Seth, Soumita, Bhadra, Tapas, Zhao, Zhongming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7465138/
https://www.ncbi.nlm.nih.gov/pubmed/32806782
http://dx.doi.org/10.3390/genes11080931
_version_ 1783577521505173504
author Mallik, Saurav
Seth, Soumita
Bhadra, Tapas
Zhao, Zhongming
author_facet Mallik, Saurav
Seth, Soumita
Bhadra, Tapas
Zhao, Zhongming
author_sort Mallik, Saurav
collection PubMed
description DNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, such methylation signal mapping has limitations. To address these limitations, in this study, we introduced a combinatorial framework using linear regression, differential expression, deep learning method for accurate biological interpretation of DNA methylation through integrating DNA methylation data and corresponding TCGA gene expression data. We demonstrated it for uterine cervical cancer. First, we pre-filtered outliers from the data set and then determined the predicted gene expression value from the pre-filtered methylation data through linear regression. We identified differentially expressed genes (DEGs) by Empirical Bayes test using [Formula: see text]. Then we applied a deep learning method, “nnet” to classify the cervical cancer label of those DEGs to determine all classification metrics including accuracy and area under curve (AUC) through 10-fold cross validation. We applied our approach to uterine cervical cancer DNA methylation dataset (NCBI accession ID: GSE30760, 27,578 features covering 63 tumor and 152 matched normal samples). After linear regression and differential expression analysis, we obtained 6287 DEGs with false discovery rate (FDR) [Formula: see text]. After performing deep learning analysis, we obtained average classification accuracy [Formula: see text] ([Formula: see text]) of the uterine cervical cancerous labels. This performance is better than that of other peer methods. We performed in-degree and out-degree hub gene network analysis using [Formula: see text]. We reported five top in-degree genes ([Formula: see text] , [Formula: see text] , [Formula: see text] , [Formula: see text] and [Formula: see text]) and five top out-degree genes ([Formula: see text] , [Formula: see text] , [Formula: see text] , [Formula: see text] and [Formula: see text]). After that, we performed KEGG pathway and Gene Ontology enrichment analysis of DEGs using tool WebGestalt(WEB-based Gene SeT AnaLysis Toolkit). In summary, our proposed framework that integrated linear regression, differential expression, deep learning provides a robust approach to better interpret DNA methylation analysis and gene expression data in disease study.
format Online
Article
Text
id pubmed-7465138
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-74651382020-09-04 A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data Mallik, Saurav Seth, Soumita Bhadra, Tapas Zhao, Zhongming Genes (Basel) Article DNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, such methylation signal mapping has limitations. To address these limitations, in this study, we introduced a combinatorial framework using linear regression, differential expression, deep learning method for accurate biological interpretation of DNA methylation through integrating DNA methylation data and corresponding TCGA gene expression data. We demonstrated it for uterine cervical cancer. First, we pre-filtered outliers from the data set and then determined the predicted gene expression value from the pre-filtered methylation data through linear regression. We identified differentially expressed genes (DEGs) by Empirical Bayes test using [Formula: see text]. Then we applied a deep learning method, “nnet” to classify the cervical cancer label of those DEGs to determine all classification metrics including accuracy and area under curve (AUC) through 10-fold cross validation. We applied our approach to uterine cervical cancer DNA methylation dataset (NCBI accession ID: GSE30760, 27,578 features covering 63 tumor and 152 matched normal samples). After linear regression and differential expression analysis, we obtained 6287 DEGs with false discovery rate (FDR) [Formula: see text]. After performing deep learning analysis, we obtained average classification accuracy [Formula: see text] ([Formula: see text]) of the uterine cervical cancerous labels. This performance is better than that of other peer methods. We performed in-degree and out-degree hub gene network analysis using [Formula: see text]. We reported five top in-degree genes ([Formula: see text] , [Formula: see text] , [Formula: see text] , [Formula: see text] and [Formula: see text]) and five top out-degree genes ([Formula: see text] , [Formula: see text] , [Formula: see text] , [Formula: see text] and [Formula: see text]). After that, we performed KEGG pathway and Gene Ontology enrichment analysis of DEGs using tool WebGestalt(WEB-based Gene SeT AnaLysis Toolkit). In summary, our proposed framework that integrated linear regression, differential expression, deep learning provides a robust approach to better interpret DNA methylation analysis and gene expression data in disease study. MDPI 2020-08-12 /pmc/articles/PMC7465138/ /pubmed/32806782 http://dx.doi.org/10.3390/genes11080931 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Mallik, Saurav
Seth, Soumita
Bhadra, Tapas
Zhao, Zhongming
A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data
title A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data
title_full A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data
title_fullStr A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data
title_full_unstemmed A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data
title_short A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data
title_sort linear regression and deep learning approach for detecting reliable genetic alterations in cancer using dna methylation and gene expression data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7465138/
https://www.ncbi.nlm.nih.gov/pubmed/32806782
http://dx.doi.org/10.3390/genes11080931
work_keys_str_mv AT malliksaurav alinearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata
AT sethsoumita alinearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata
AT bhadratapas alinearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata
AT zhaozhongming alinearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata
AT malliksaurav linearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata
AT sethsoumita linearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata
AT bhadratapas linearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata
AT zhaozhongming linearregressionanddeeplearningapproachfordetectingreliablegeneticalterationsincancerusingdnamethylationandgeneexpressiondata