Cargando…

Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy

BACKGROUND: The early diagnosis of lung cancer has been a critical problem in clinical practice for a long time and identifying differentially expressed gene as disease marker is a promising solution. However, the most existing gene differential expression analysis (DEA) methods have two main drawba...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Zhixun, Peng, Hui, Zhang, Xiaocai, Zheng, Yi, Chen, Fang, Fang, Liang, Li, Jinyan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923882/ https://www.ncbi.nlm.nih.gov/pubmed/31856830 http://dx.doi.org/10.1186/s12920-019-0630-4

_version_	1783481612988579840
author	Zhao, Zhixun Peng, Hui Zhang, Xiaocai Zheng, Yi Chen, Fang Fang, Liang Li, Jinyan
author_facet	Zhao, Zhixun Peng, Hui Zhang, Xiaocai Zheng, Yi Chen, Fang Fang, Liang Li, Jinyan
author_sort	Zhao, Zhixun
collection	PubMed
description	BACKGROUND: The early diagnosis of lung cancer has been a critical problem in clinical practice for a long time and identifying differentially expressed gene as disease marker is a promising solution. However, the most existing gene differential expression analysis (DEA) methods have two main drawbacks: First, these methods are based on fixed statistical hypotheses and not always effective; Second, these methods can not identify a certain expression level boundary when there is no obvious expression level gap between control and experiment groups. METHODS: This paper proposed a novel approach to identify marker genes and gene expression level boundary for lung cancer. By calculating a kernel maximum mean discrepancy, our method can evaluate the expression differences between normal, normal adjacent to tumor (NAT) and tumor samples. For the potential marker genes, the expression level boundaries among different groups are defined with the information entropy method. RESULTS: Compared with two conventional methods t-test and fold change, the top average ranked genes selected by our method can achieve better performance under all metrics in the 10-fold cross-validation. Then GO and KEGG enrichment analysis are conducted to explore the biological function of the top 100 ranked genes. At last, we choose the top 10 average ranked genes as lung cancer markers and their expression boundaries are calculated and reported. CONCLUSION: The proposed approach is effective to identify gene markers for lung cancer diagnosis. It is not only more accurate than conventional DEA methods but also provides a reliable method to identify the gene expression level boundaries.
format	Online Article Text
id	pubmed-6923882
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-69238822019-12-30 Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy Zhao, Zhixun Peng, Hui Zhang, Xiaocai Zheng, Yi Chen, Fang Fang, Liang Li, Jinyan BMC Med Genomics Research BACKGROUND: The early diagnosis of lung cancer has been a critical problem in clinical practice for a long time and identifying differentially expressed gene as disease marker is a promising solution. However, the most existing gene differential expression analysis (DEA) methods have two main drawbacks: First, these methods are based on fixed statistical hypotheses and not always effective; Second, these methods can not identify a certain expression level boundary when there is no obvious expression level gap between control and experiment groups. METHODS: This paper proposed a novel approach to identify marker genes and gene expression level boundary for lung cancer. By calculating a kernel maximum mean discrepancy, our method can evaluate the expression differences between normal, normal adjacent to tumor (NAT) and tumor samples. For the potential marker genes, the expression level boundaries among different groups are defined with the information entropy method. RESULTS: Compared with two conventional methods t-test and fold change, the top average ranked genes selected by our method can achieve better performance under all metrics in the 10-fold cross-validation. Then GO and KEGG enrichment analysis are conducted to explore the biological function of the top 100 ranked genes. At last, we choose the top 10 average ranked genes as lung cancer markers and their expression boundaries are calculated and reported. CONCLUSION: The proposed approach is effective to identify gene markers for lung cancer diagnosis. It is not only more accurate than conventional DEA methods but also provides a reliable method to identify the gene expression level boundaries. BioMed Central 2019-12-20 /pmc/articles/PMC6923882/ /pubmed/31856830 http://dx.doi.org/10.1186/s12920-019-0630-4 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Zhao, Zhixun Peng, Hui Zhang, Xiaocai Zheng, Yi Chen, Fang Fang, Liang Li, Jinyan Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy
title	Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy
title_full	Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy
title_fullStr	Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy
title_full_unstemmed	Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy
title_short	Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy
title_sort	identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923882/ https://www.ncbi.nlm.nih.gov/pubmed/31856830 http://dx.doi.org/10.1186/s12920-019-0630-4
work_keys_str_mv	AT zhaozhixun identificationoflungcancergenemarkersthroughkernelmaximummeandiscrepancyandinformationentropy AT penghui identificationoflungcancergenemarkersthroughkernelmaximummeandiscrepancyandinformationentropy AT zhangxiaocai identificationoflungcancergenemarkersthroughkernelmaximummeandiscrepancyandinformationentropy AT zhengyi identificationoflungcancergenemarkersthroughkernelmaximummeandiscrepancyandinformationentropy AT chenfang identificationoflungcancergenemarkersthroughkernelmaximummeandiscrepancyandinformationentropy AT fangliang identificationoflungcancergenemarkersthroughkernelmaximummeandiscrepancyandinformationentropy AT lijinyan identificationoflungcancergenemarkersthroughkernelmaximummeandiscrepancyandinformationentropy

Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy

Ejemplares similares