Cargando…

Predicting the Lung Adenocarcinoma and Its Biomarkers by Integrating Gene Expression and DNA Methylation Data

The early symptoms of lung adenocarcinoma patients are inapparent, and the clinical diagnosis of lung adenocarcinoma is primarily through X-ray examination and pathological section examination, whereas the discovery of biomarkers points out another direction for the diagnosis of lung adenocarcinoma...

Descripción completa

Detalles Bibliográficos
Autores principales: Qiu, Wang-Ren, Qi, Bei-Bei, Lin, Wei-Zhong, Zhang, Shou-Hua, Yu, Wang-Ke, Huang, Shun-Fa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9280023/
https://www.ncbi.nlm.nih.gov/pubmed/35846148
http://dx.doi.org/10.3389/fgene.2022.926927
_version_ 1784746540843139072
author Qiu, Wang-Ren
Qi, Bei-Bei
Lin, Wei-Zhong
Zhang, Shou-Hua
Yu, Wang-Ke
Huang, Shun-Fa
author_facet Qiu, Wang-Ren
Qi, Bei-Bei
Lin, Wei-Zhong
Zhang, Shou-Hua
Yu, Wang-Ke
Huang, Shun-Fa
author_sort Qiu, Wang-Ren
collection PubMed
description The early symptoms of lung adenocarcinoma patients are inapparent, and the clinical diagnosis of lung adenocarcinoma is primarily through X-ray examination and pathological section examination, whereas the discovery of biomarkers points out another direction for the diagnosis of lung adenocarcinoma with the development of bioinformatics technology. However, it is not accurate and trustworthy to diagnose lung adenocarcinoma due to omics data with high-dimension and low-sample size (HDLSS) features or biomarkers produced by utilizing only single omics data. To address the above problems, the feature selection methods of biological analysis are used to reduce the dimension of gene expression data (GSE19188) and DNA methylation data (GSE139032, GSE49996). In addition, the Cartesian product method is used to expand the sample set and integrate gene expression data and DNA methylation data. The classification is built by using a deep neural network and is evaluated on K-fold cross validation. Moreover, gene ontology analysis and literature retrieving are used to analyze the biological relevance of selected genes, TCGA database is used for survival analysis of these potential genes through Kaplan-Meier estimates to discover the detailed molecular mechanism of lung adenocarcinoma. Survival analysis shows that COL5A2 and SERPINB5 are significant for identifying lung adenocarcinoma and are considered biomarkers of lung adenocarcinoma.
format Online
Article
Text
id pubmed-9280023
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-92800232022-07-15 Predicting the Lung Adenocarcinoma and Its Biomarkers by Integrating Gene Expression and DNA Methylation Data Qiu, Wang-Ren Qi, Bei-Bei Lin, Wei-Zhong Zhang, Shou-Hua Yu, Wang-Ke Huang, Shun-Fa Front Genet Genetics The early symptoms of lung adenocarcinoma patients are inapparent, and the clinical diagnosis of lung adenocarcinoma is primarily through X-ray examination and pathological section examination, whereas the discovery of biomarkers points out another direction for the diagnosis of lung adenocarcinoma with the development of bioinformatics technology. However, it is not accurate and trustworthy to diagnose lung adenocarcinoma due to omics data with high-dimension and low-sample size (HDLSS) features or biomarkers produced by utilizing only single omics data. To address the above problems, the feature selection methods of biological analysis are used to reduce the dimension of gene expression data (GSE19188) and DNA methylation data (GSE139032, GSE49996). In addition, the Cartesian product method is used to expand the sample set and integrate gene expression data and DNA methylation data. The classification is built by using a deep neural network and is evaluated on K-fold cross validation. Moreover, gene ontology analysis and literature retrieving are used to analyze the biological relevance of selected genes, TCGA database is used for survival analysis of these potential genes through Kaplan-Meier estimates to discover the detailed molecular mechanism of lung adenocarcinoma. Survival analysis shows that COL5A2 and SERPINB5 are significant for identifying lung adenocarcinoma and are considered biomarkers of lung adenocarcinoma. Frontiers Media S.A. 2022-06-30 /pmc/articles/PMC9280023/ /pubmed/35846148 http://dx.doi.org/10.3389/fgene.2022.926927 Text en Copyright © 2022 Qiu, Qi, Lin, Zhang, Yu and Huang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Qiu, Wang-Ren
Qi, Bei-Bei
Lin, Wei-Zhong
Zhang, Shou-Hua
Yu, Wang-Ke
Huang, Shun-Fa
Predicting the Lung Adenocarcinoma and Its Biomarkers by Integrating Gene Expression and DNA Methylation Data
title Predicting the Lung Adenocarcinoma and Its Biomarkers by Integrating Gene Expression and DNA Methylation Data
title_full Predicting the Lung Adenocarcinoma and Its Biomarkers by Integrating Gene Expression and DNA Methylation Data
title_fullStr Predicting the Lung Adenocarcinoma and Its Biomarkers by Integrating Gene Expression and DNA Methylation Data
title_full_unstemmed Predicting the Lung Adenocarcinoma and Its Biomarkers by Integrating Gene Expression and DNA Methylation Data
title_short Predicting the Lung Adenocarcinoma and Its Biomarkers by Integrating Gene Expression and DNA Methylation Data
title_sort predicting the lung adenocarcinoma and its biomarkers by integrating gene expression and dna methylation data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9280023/
https://www.ncbi.nlm.nih.gov/pubmed/35846148
http://dx.doi.org/10.3389/fgene.2022.926927
work_keys_str_mv AT qiuwangren predictingthelungadenocarcinomaanditsbiomarkersbyintegratinggeneexpressionanddnamethylationdata
AT qibeibei predictingthelungadenocarcinomaanditsbiomarkersbyintegratinggeneexpressionanddnamethylationdata
AT linweizhong predictingthelungadenocarcinomaanditsbiomarkersbyintegratinggeneexpressionanddnamethylationdata
AT zhangshouhua predictingthelungadenocarcinomaanditsbiomarkersbyintegratinggeneexpressionanddnamethylationdata
AT yuwangke predictingthelungadenocarcinomaanditsbiomarkersbyintegratinggeneexpressionanddnamethylationdata
AT huangshunfa predictingthelungadenocarcinomaanditsbiomarkersbyintegratinggeneexpressionanddnamethylationdata