Cargando…
DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations
BACKGROUND: With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accu...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5259816/ https://www.ncbi.nlm.nih.gov/pubmed/28155641 http://dx.doi.org/10.1186/s12859-016-1334-9 |
_version_ | 1782499279494971392 |
---|---|
author | Yuan, Yuchen Shi, Yi Li, Changyang Kim, Jinman Cai, Weidong Han, Zeguang Feng, David Dagan |
author_facet | Yuan, Yuchen Shi, Yi Li, Changyang Kim, Jinman Cai, Weidong Han, Zeguang Feng, David Dagan |
author_sort | Yuan, Yuchen |
collection | PubMed |
description | BACKGROUND: With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. RESULTS: To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. CONCLUSIONS: Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types. |
format | Online Article Text |
id | pubmed-5259816 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-52598162017-01-26 DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations Yuan, Yuchen Shi, Yi Li, Changyang Kim, Jinman Cai, Weidong Han, Zeguang Feng, David Dagan BMC Bioinformatics Research BACKGROUND: With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. RESULTS: To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. CONCLUSIONS: Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types. BioMed Central 2016-12-23 /pmc/articles/PMC5259816/ /pubmed/28155641 http://dx.doi.org/10.1186/s12859-016-1334-9 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Yuan, Yuchen Shi, Yi Li, Changyang Kim, Jinman Cai, Weidong Han, Zeguang Feng, David Dagan DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations |
title | DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations |
title_full | DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations |
title_fullStr | DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations |
title_full_unstemmed | DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations |
title_short | DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations |
title_sort | deepgene: an advanced cancer type classifier based on deep learning and somatic point mutations |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5259816/ https://www.ncbi.nlm.nih.gov/pubmed/28155641 http://dx.doi.org/10.1186/s12859-016-1334-9 |
work_keys_str_mv | AT yuanyuchen deepgeneanadvancedcancertypeclassifierbasedondeeplearningandsomaticpointmutations AT shiyi deepgeneanadvancedcancertypeclassifierbasedondeeplearningandsomaticpointmutations AT lichangyang deepgeneanadvancedcancertypeclassifierbasedondeeplearningandsomaticpointmutations AT kimjinman deepgeneanadvancedcancertypeclassifierbasedondeeplearningandsomaticpointmutations AT caiweidong deepgeneanadvancedcancertypeclassifierbasedondeeplearningandsomaticpointmutations AT hanzeguang deepgeneanadvancedcancertypeclassifierbasedondeeplearningandsomaticpointmutations AT fengdaviddagan deepgeneanadvancedcancertypeclassifierbasedondeeplearningandsomaticpointmutations |