Cargando…

DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations

BACKGROUND: With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accu...

Descripción completa

Detalles Bibliográficos
Autores principales: Yuan, Yuchen, Shi, Yi, Li, Changyang, Kim, Jinman, Cai, Weidong, Han, Zeguang, Feng, David Dagan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5259816/
https://www.ncbi.nlm.nih.gov/pubmed/28155641
http://dx.doi.org/10.1186/s12859-016-1334-9
_version_ 1782499279494971392
author Yuan, Yuchen
Shi, Yi
Li, Changyang
Kim, Jinman
Cai, Weidong
Han, Zeguang
Feng, David Dagan
author_facet Yuan, Yuchen
Shi, Yi
Li, Changyang
Kim, Jinman
Cai, Weidong
Han, Zeguang
Feng, David Dagan
author_sort Yuan, Yuchen
collection PubMed
description BACKGROUND: With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. RESULTS: To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. CONCLUSIONS: Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types.
format Online
Article
Text
id pubmed-5259816
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-52598162017-01-26 DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations Yuan, Yuchen Shi, Yi Li, Changyang Kim, Jinman Cai, Weidong Han, Zeguang Feng, David Dagan BMC Bioinformatics Research BACKGROUND: With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. RESULTS: To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. CONCLUSIONS: Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types. BioMed Central 2016-12-23 /pmc/articles/PMC5259816/ /pubmed/28155641 http://dx.doi.org/10.1186/s12859-016-1334-9 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Yuan, Yuchen
Shi, Yi
Li, Changyang
Kim, Jinman
Cai, Weidong
Han, Zeguang
Feng, David Dagan
DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations
title DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations
title_full DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations
title_fullStr DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations
title_full_unstemmed DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations
title_short DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations
title_sort deepgene: an advanced cancer type classifier based on deep learning and somatic point mutations
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5259816/
https://www.ncbi.nlm.nih.gov/pubmed/28155641
http://dx.doi.org/10.1186/s12859-016-1334-9
work_keys_str_mv AT yuanyuchen deepgeneanadvancedcancertypeclassifierbasedondeeplearningandsomaticpointmutations
AT shiyi deepgeneanadvancedcancertypeclassifierbasedondeeplearningandsomaticpointmutations
AT lichangyang deepgeneanadvancedcancertypeclassifierbasedondeeplearningandsomaticpointmutations
AT kimjinman deepgeneanadvancedcancertypeclassifierbasedondeeplearningandsomaticpointmutations
AT caiweidong deepgeneanadvancedcancertypeclassifierbasedondeeplearningandsomaticpointmutations
AT hanzeguang deepgeneanadvancedcancertypeclassifierbasedondeeplearningandsomaticpointmutations
AT fengdaviddagan deepgeneanadvancedcancertypeclassifierbasedondeeplearningandsomaticpointmutations