Cargando…

Convolutional neural network models for cancer type prediction based on gene expression

BACKGROUND: Precise prediction of cancer types is vital for cancer diagnosis and therapy. Through a predictive model, important cancer marker genes can be inferred. Several studies have attempted to build machine learning models for this task however none has taken into consideration the effects of...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mostavi, Milad, Chiu, Yu-Chiao, Huang, Yufei, Chen, Yidong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7119277/ https://www.ncbi.nlm.nih.gov/pubmed/32241303 http://dx.doi.org/10.1186/s12920-020-0677-2

_version_	1783514739532365824
author	Mostavi, Milad Chiu, Yu-Chiao Huang, Yufei Chen, Yidong
author_facet	Mostavi, Milad Chiu, Yu-Chiao Huang, Yufei Chen, Yidong
author_sort	Mostavi, Milad
collection	PubMed
description	BACKGROUND: Precise prediction of cancer types is vital for cancer diagnosis and therapy. Through a predictive model, important cancer marker genes can be inferred. Several studies have attempted to build machine learning models for this task however none has taken into consideration the effects of tissue of origin that can potentially bias the identification of cancer markers. RESULTS: In this paper, we introduced several Convolutional Neural Network (CNN) models that take unstructured gene expression inputs to classify tumor and non-tumor samples into their designated cancer types or as normal. Based on different designs of gene embeddings and convolution schemes, we implemented three CNN models: 1D-CNN, 2D-Vanilla-CNN, and 2D-Hybrid-CNN. The models were trained and tested on gene expression profiles from combined 10,340 samples of 33 cancer types and 713 matched normal tissues of The Cancer Genome Atlas (TCGA). Our models achieved excellent prediction accuracies (93.9–95.0%) among 34 classes (33 cancers and normal). Furthermore, we interpreted one of the models, 1D-CNN model, with a guided saliency technique and identified a total of 2090 cancer markers (108 per class on average). The concordance of differential expression of these markers between the cancer type they represent and others is confirmed. In breast cancer, for instance, our model identified well-known markers, such as GATA3 and ESR1. Finally, we extended the 1D-CNN model for the prediction of breast cancer subtypes and achieved an average accuracy of 88.42% among 5 subtypes. The codes can be found at https://github.com/chenlabgccri/CancerTypePrediction. CONCLUSIONS: Here we present novel CNN designs for accurate and simultaneous cancer/normal and cancer types prediction based on gene expression profiles, and unique model interpretation scheme to elucidate biologically relevance of cancer marker genes after eliminating the effects of tissue-of-origin. The proposed model has light hyperparameters to be trained and thus can be easily adapted to facilitate cancer diagnosis in the future.
format	Online Article Text
id	pubmed-7119277
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-71192772020-04-07 Convolutional neural network models for cancer type prediction based on gene expression Mostavi, Milad Chiu, Yu-Chiao Huang, Yufei Chen, Yidong BMC Med Genomics Research BACKGROUND: Precise prediction of cancer types is vital for cancer diagnosis and therapy. Through a predictive model, important cancer marker genes can be inferred. Several studies have attempted to build machine learning models for this task however none has taken into consideration the effects of tissue of origin that can potentially bias the identification of cancer markers. RESULTS: In this paper, we introduced several Convolutional Neural Network (CNN) models that take unstructured gene expression inputs to classify tumor and non-tumor samples into their designated cancer types or as normal. Based on different designs of gene embeddings and convolution schemes, we implemented three CNN models: 1D-CNN, 2D-Vanilla-CNN, and 2D-Hybrid-CNN. The models were trained and tested on gene expression profiles from combined 10,340 samples of 33 cancer types and 713 matched normal tissues of The Cancer Genome Atlas (TCGA). Our models achieved excellent prediction accuracies (93.9–95.0%) among 34 classes (33 cancers and normal). Furthermore, we interpreted one of the models, 1D-CNN model, with a guided saliency technique and identified a total of 2090 cancer markers (108 per class on average). The concordance of differential expression of these markers between the cancer type they represent and others is confirmed. In breast cancer, for instance, our model identified well-known markers, such as GATA3 and ESR1. Finally, we extended the 1D-CNN model for the prediction of breast cancer subtypes and achieved an average accuracy of 88.42% among 5 subtypes. The codes can be found at https://github.com/chenlabgccri/CancerTypePrediction. CONCLUSIONS: Here we present novel CNN designs for accurate and simultaneous cancer/normal and cancer types prediction based on gene expression profiles, and unique model interpretation scheme to elucidate biologically relevance of cancer marker genes after eliminating the effects of tissue-of-origin. The proposed model has light hyperparameters to be trained and thus can be easily adapted to facilitate cancer diagnosis in the future. BioMed Central 2020-04-03 /pmc/articles/PMC7119277/ /pubmed/32241303 http://dx.doi.org/10.1186/s12920-020-0677-2 Text en © The Author(s). 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Mostavi, Milad Chiu, Yu-Chiao Huang, Yufei Chen, Yidong Convolutional neural network models for cancer type prediction based on gene expression
title	Convolutional neural network models for cancer type prediction based on gene expression
title_full	Convolutional neural network models for cancer type prediction based on gene expression
title_fullStr	Convolutional neural network models for cancer type prediction based on gene expression
title_full_unstemmed	Convolutional neural network models for cancer type prediction based on gene expression
title_short	Convolutional neural network models for cancer type prediction based on gene expression
title_sort	convolutional neural network models for cancer type prediction based on gene expression
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7119277/ https://www.ncbi.nlm.nih.gov/pubmed/32241303 http://dx.doi.org/10.1186/s12920-020-0677-2
work_keys_str_mv	AT mostavimilad convolutionalneuralnetworkmodelsforcancertypepredictionbasedongeneexpression AT chiuyuchiao convolutionalneuralnetworkmodelsforcancertypepredictionbasedongeneexpression AT huangyufei convolutionalneuralnetworkmodelsforcancertypepredictionbasedongeneexpression AT chenyidong convolutionalneuralnetworkmodelsforcancertypepredictionbasedongeneexpression

Convolutional neural network models for cancer type prediction based on gene expression

Ejemplares similares