Cargando…

Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection

BACKGROUND: Lung cancer is one of the cancers with the highest mortality rate in China. With the rapid development of high-throughput sequencing technology and the research and application of deep learning methods in recent years, deep neural networks based on gene expression have become a hot resea...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Suli, Yao, Wu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9103042/
https://www.ncbi.nlm.nih.gov/pubmed/35549644
http://dx.doi.org/10.1186/s12859-022-04689-9
_version_ 1784707468860850176
author Liu, Suli
Yao, Wu
author_facet Liu, Suli
Yao, Wu
author_sort Liu, Suli
collection PubMed
description BACKGROUND: Lung cancer is one of the cancers with the highest mortality rate in China. With the rapid development of high-throughput sequencing technology and the research and application of deep learning methods in recent years, deep neural networks based on gene expression have become a hot research direction in lung cancer diagnosis in recent years, which provide an effective way of early diagnosis for lung cancer. Thus, building a deep neural network model is of great significance for the early diagnosis of lung cancer. However, the main challenges in mining gene expression datasets are the curse of dimensionality and imbalanced data. The existing methods proposed by some researchers can’t address the problems of high-dimensionality and imbalanced data, because of the overwhelming number of variables measured (genes) versus the small number of samples, which result in poor performance in early diagnosis for lung cancer. METHOD: Given the disadvantages of gene expression data sets with small datasets, high-dimensionality and imbalanced data, this paper proposes a gene selection method based on KL divergence, which selects some genes with higher KL divergence as model features. Then build a deep neural network model using Focal Loss as loss function, at the same time, we use k-fold cross validation method to verify and select the best model, we set the value of k is five in this paper. RESULT: The deep learning model method based on KL divergence gene selection proposed in this paper has an AUC of 0.99 on the validation set. The generalization performance of model is high. CONCLUSION: The deep neural network model based on KL divergence gene selection proposed in this paper is proved to be an accurate and effective method for lung cancer prediction.
format Online
Article
Text
id pubmed-9103042
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-91030422022-05-14 Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection Liu, Suli Yao, Wu BMC Bioinformatics Research BACKGROUND: Lung cancer is one of the cancers with the highest mortality rate in China. With the rapid development of high-throughput sequencing technology and the research and application of deep learning methods in recent years, deep neural networks based on gene expression have become a hot research direction in lung cancer diagnosis in recent years, which provide an effective way of early diagnosis for lung cancer. Thus, building a deep neural network model is of great significance for the early diagnosis of lung cancer. However, the main challenges in mining gene expression datasets are the curse of dimensionality and imbalanced data. The existing methods proposed by some researchers can’t address the problems of high-dimensionality and imbalanced data, because of the overwhelming number of variables measured (genes) versus the small number of samples, which result in poor performance in early diagnosis for lung cancer. METHOD: Given the disadvantages of gene expression data sets with small datasets, high-dimensionality and imbalanced data, this paper proposes a gene selection method based on KL divergence, which selects some genes with higher KL divergence as model features. Then build a deep neural network model using Focal Loss as loss function, at the same time, we use k-fold cross validation method to verify and select the best model, we set the value of k is five in this paper. RESULT: The deep learning model method based on KL divergence gene selection proposed in this paper has an AUC of 0.99 on the validation set. The generalization performance of model is high. CONCLUSION: The deep neural network model based on KL divergence gene selection proposed in this paper is proved to be an accurate and effective method for lung cancer prediction. BioMed Central 2022-05-12 /pmc/articles/PMC9103042/ /pubmed/35549644 http://dx.doi.org/10.1186/s12859-022-04689-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Liu, Suli
Yao, Wu
Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection
title Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection
title_full Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection
title_fullStr Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection
title_full_unstemmed Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection
title_short Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection
title_sort prediction of lung cancer using gene expression and deep learning with kl divergence gene selection
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9103042/
https://www.ncbi.nlm.nih.gov/pubmed/35549644
http://dx.doi.org/10.1186/s12859-022-04689-9
work_keys_str_mv AT liusuli predictionoflungcancerusinggeneexpressionanddeeplearningwithkldivergencegeneselection
AT yaowu predictionoflungcancerusinggeneexpressionanddeeplearningwithkldivergencegeneselection