Cargando…
Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection
BACKGROUND: Lung cancer is one of the cancers with the highest mortality rate in China. With the rapid development of high-throughput sequencing technology and the research and application of deep learning methods in recent years, deep neural networks based on gene expression have become a hot resea...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9103042/ https://www.ncbi.nlm.nih.gov/pubmed/35549644 http://dx.doi.org/10.1186/s12859-022-04689-9 |
_version_ | 1784707468860850176 |
---|---|
author | Liu, Suli Yao, Wu |
author_facet | Liu, Suli Yao, Wu |
author_sort | Liu, Suli |
collection | PubMed |
description | BACKGROUND: Lung cancer is one of the cancers with the highest mortality rate in China. With the rapid development of high-throughput sequencing technology and the research and application of deep learning methods in recent years, deep neural networks based on gene expression have become a hot research direction in lung cancer diagnosis in recent years, which provide an effective way of early diagnosis for lung cancer. Thus, building a deep neural network model is of great significance for the early diagnosis of lung cancer. However, the main challenges in mining gene expression datasets are the curse of dimensionality and imbalanced data. The existing methods proposed by some researchers can’t address the problems of high-dimensionality and imbalanced data, because of the overwhelming number of variables measured (genes) versus the small number of samples, which result in poor performance in early diagnosis for lung cancer. METHOD: Given the disadvantages of gene expression data sets with small datasets, high-dimensionality and imbalanced data, this paper proposes a gene selection method based on KL divergence, which selects some genes with higher KL divergence as model features. Then build a deep neural network model using Focal Loss as loss function, at the same time, we use k-fold cross validation method to verify and select the best model, we set the value of k is five in this paper. RESULT: The deep learning model method based on KL divergence gene selection proposed in this paper has an AUC of 0.99 on the validation set. The generalization performance of model is high. CONCLUSION: The deep neural network model based on KL divergence gene selection proposed in this paper is proved to be an accurate and effective method for lung cancer prediction. |
format | Online Article Text |
id | pubmed-9103042 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-91030422022-05-14 Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection Liu, Suli Yao, Wu BMC Bioinformatics Research BACKGROUND: Lung cancer is one of the cancers with the highest mortality rate in China. With the rapid development of high-throughput sequencing technology and the research and application of deep learning methods in recent years, deep neural networks based on gene expression have become a hot research direction in lung cancer diagnosis in recent years, which provide an effective way of early diagnosis for lung cancer. Thus, building a deep neural network model is of great significance for the early diagnosis of lung cancer. However, the main challenges in mining gene expression datasets are the curse of dimensionality and imbalanced data. The existing methods proposed by some researchers can’t address the problems of high-dimensionality and imbalanced data, because of the overwhelming number of variables measured (genes) versus the small number of samples, which result in poor performance in early diagnosis for lung cancer. METHOD: Given the disadvantages of gene expression data sets with small datasets, high-dimensionality and imbalanced data, this paper proposes a gene selection method based on KL divergence, which selects some genes with higher KL divergence as model features. Then build a deep neural network model using Focal Loss as loss function, at the same time, we use k-fold cross validation method to verify and select the best model, we set the value of k is five in this paper. RESULT: The deep learning model method based on KL divergence gene selection proposed in this paper has an AUC of 0.99 on the validation set. The generalization performance of model is high. CONCLUSION: The deep neural network model based on KL divergence gene selection proposed in this paper is proved to be an accurate and effective method for lung cancer prediction. BioMed Central 2022-05-12 /pmc/articles/PMC9103042/ /pubmed/35549644 http://dx.doi.org/10.1186/s12859-022-04689-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Liu, Suli Yao, Wu Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection |
title | Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection |
title_full | Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection |
title_fullStr | Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection |
title_full_unstemmed | Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection |
title_short | Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection |
title_sort | prediction of lung cancer using gene expression and deep learning with kl divergence gene selection |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9103042/ https://www.ncbi.nlm.nih.gov/pubmed/35549644 http://dx.doi.org/10.1186/s12859-022-04689-9 |
work_keys_str_mv | AT liusuli predictionoflungcancerusinggeneexpressionanddeeplearningwithkldivergencegeneselection AT yaowu predictionoflungcancerusinggeneexpressionanddeeplearningwithkldivergencegeneselection |