Cargando…
DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning
BACKGROUND: A limitation of traditional differential expression analysis on small datasets involves the possibility of false positives and false negatives due to sample variation. Considering the recent advances in deep learning (DL) based models, we wanted to expand the state-of-the-art in disease...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8734099/ https://www.ncbi.nlm.nih.gov/pubmed/34991439 http://dx.doi.org/10.1186/s12859-021-04527-4 |
_version_ | 1784627943641710592 |
---|---|
author | Kakati, Tulika Bhattacharyya, Dhruba K. Kalita, Jugal K. Norden-Krichmar, Trina M. |
author_facet | Kakati, Tulika Bhattacharyya, Dhruba K. Kalita, Jugal K. Norden-Krichmar, Trina M. |
author_sort | Kakati, Tulika |
collection | PubMed |
description | BACKGROUND: A limitation of traditional differential expression analysis on small datasets involves the possibility of false positives and false negatives due to sample variation. Considering the recent advances in deep learning (DL) based models, we wanted to expand the state-of-the-art in disease biomarker prediction from RNA-seq data using DL. However, application of DL to RNA-seq data is challenging due to absence of appropriate labels and smaller sample size as compared to number of genes. Deep learning coupled with transfer learning can improve prediction performance on novel data by incorporating patterns learned from other related data. With the emergence of new disease datasets, biomarker prediction would be facilitated by having a generalized model that can transfer the knowledge of trained feature maps to the new dataset. To the best of our knowledge, there is no Convolutional Neural Network (CNN)-based model coupled with transfer learning to predict the significant upregulating (UR) and downregulating (DR) genes from both trained and untrained datasets. RESULTS: We implemented a CNN model, DEGnext, to predict UR and DR genes from gene expression data obtained from The Cancer Genome Atlas database. DEGnext uses biologically validated data along with logarithmic fold change values to classify differentially expressed genes (DEGs) as UR and DR genes. We applied transfer learning to our model to leverage the knowledge of trained feature maps to untrained cancer datasets. DEGnext’s results were competitive (ROC scores between 88 and 99[Formula: see text] ) with those of five traditional machine learning methods: Decision Tree, K-Nearest Neighbors, Random Forest, Support Vector Machine, and XGBoost. DEGnext was robust and effective in terms of transferring learned feature maps to facilitate classification of unseen datasets. Additionally, we validated that the predicted DEGs from DEGnext were mapped to significant Gene Ontology terms and pathways related to cancer. CONCLUSIONS: DEGnext can classify DEGs into UR and DR genes from RNA-seq cancer datasets with high performance. This type of analysis, using biologically relevant fine-tuning data, may aid in the exploration of potential biomarkers and can be adapted for other disease datasets. |
format | Online Article Text |
id | pubmed-8734099 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-87340992022-01-07 DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning Kakati, Tulika Bhattacharyya, Dhruba K. Kalita, Jugal K. Norden-Krichmar, Trina M. BMC Bioinformatics Methodology Article BACKGROUND: A limitation of traditional differential expression analysis on small datasets involves the possibility of false positives and false negatives due to sample variation. Considering the recent advances in deep learning (DL) based models, we wanted to expand the state-of-the-art in disease biomarker prediction from RNA-seq data using DL. However, application of DL to RNA-seq data is challenging due to absence of appropriate labels and smaller sample size as compared to number of genes. Deep learning coupled with transfer learning can improve prediction performance on novel data by incorporating patterns learned from other related data. With the emergence of new disease datasets, biomarker prediction would be facilitated by having a generalized model that can transfer the knowledge of trained feature maps to the new dataset. To the best of our knowledge, there is no Convolutional Neural Network (CNN)-based model coupled with transfer learning to predict the significant upregulating (UR) and downregulating (DR) genes from both trained and untrained datasets. RESULTS: We implemented a CNN model, DEGnext, to predict UR and DR genes from gene expression data obtained from The Cancer Genome Atlas database. DEGnext uses biologically validated data along with logarithmic fold change values to classify differentially expressed genes (DEGs) as UR and DR genes. We applied transfer learning to our model to leverage the knowledge of trained feature maps to untrained cancer datasets. DEGnext’s results were competitive (ROC scores between 88 and 99[Formula: see text] ) with those of five traditional machine learning methods: Decision Tree, K-Nearest Neighbors, Random Forest, Support Vector Machine, and XGBoost. DEGnext was robust and effective in terms of transferring learned feature maps to facilitate classification of unseen datasets. Additionally, we validated that the predicted DEGs from DEGnext were mapped to significant Gene Ontology terms and pathways related to cancer. CONCLUSIONS: DEGnext can classify DEGs into UR and DR genes from RNA-seq cancer datasets with high performance. This type of analysis, using biologically relevant fine-tuning data, may aid in the exploration of potential biomarkers and can be adapted for other disease datasets. BioMed Central 2022-01-06 /pmc/articles/PMC8734099/ /pubmed/34991439 http://dx.doi.org/10.1186/s12859-021-04527-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Article Kakati, Tulika Bhattacharyya, Dhruba K. Kalita, Jugal K. Norden-Krichmar, Trina M. DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning |
title | DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning |
title_full | DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning |
title_fullStr | DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning |
title_full_unstemmed | DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning |
title_short | DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning |
title_sort | degnext: classification of differentially expressed genes from rna-seq data using a convolutional neural network with transfer learning |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8734099/ https://www.ncbi.nlm.nih.gov/pubmed/34991439 http://dx.doi.org/10.1186/s12859-021-04527-4 |
work_keys_str_mv | AT kakatitulika degnextclassificationofdifferentiallyexpressedgenesfromrnaseqdatausingaconvolutionalneuralnetworkwithtransferlearning AT bhattacharyyadhrubak degnextclassificationofdifferentiallyexpressedgenesfromrnaseqdatausingaconvolutionalneuralnetworkwithtransferlearning AT kalitajugalk degnextclassificationofdifferentiallyexpressedgenesfromrnaseqdatausingaconvolutionalneuralnetworkwithtransferlearning AT nordenkrichmartrinam degnextclassificationofdifferentiallyexpressedgenesfromrnaseqdatausingaconvolutionalneuralnetworkwithtransferlearning |