Cargando…

CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network

With recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Kanggeun, Jeong, Hyoung-oh, Lee, Semin, Jeong, Won-Ki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6858312/
https://www.ncbi.nlm.nih.gov/pubmed/31729414
http://dx.doi.org/10.1038/s41598-019-53034-3
_version_ 1783470930463293440
author Lee, Kanggeun
Jeong, Hyoung-oh
Lee, Semin
Jeong, Won-Ki
author_facet Lee, Kanggeun
Jeong, Hyoung-oh
Lee, Semin
Jeong, Won-Ki
author_sort Lee, Kanggeun
collection PubMed
description With recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the ever-increasing size and complexity of the data make the classification task extremely challenging. In this study, we evaluate the contributions of various input features, such as mutation profiles, mutation rates, mutation spectra and signatures, and somatic copy number alterations that can be derived from genomic data, and further utilize them for accurate cancer type classification. We introduce a novel ensemble of machine learning classifiers, called CPEM (Cancer Predictor using an Ensemble Model), which is tested on 7,002 samples representing over 31 different cancer types collected from The Cancer Genome Atlas (TCGA) database. We first systematically examined the impact of the input features. Features known to be associated with specific cancers had relatively high importance in our initial prediction model. We further investigated various machine learning classifiers and feature selection methods to derive the ensemble-based cancer type prediction model achieving up to 84% classification accuracy in the nested 10-fold cross-validation. Finally, we narrowed down the target cancers to the six most common types and achieved up to 94% accuracy.
format Online
Article
Text
id pubmed-6858312
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-68583122019-11-27 CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network Lee, Kanggeun Jeong, Hyoung-oh Lee, Semin Jeong, Won-Ki Sci Rep Article With recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the ever-increasing size and complexity of the data make the classification task extremely challenging. In this study, we evaluate the contributions of various input features, such as mutation profiles, mutation rates, mutation spectra and signatures, and somatic copy number alterations that can be derived from genomic data, and further utilize them for accurate cancer type classification. We introduce a novel ensemble of machine learning classifiers, called CPEM (Cancer Predictor using an Ensemble Model), which is tested on 7,002 samples representing over 31 different cancer types collected from The Cancer Genome Atlas (TCGA) database. We first systematically examined the impact of the input features. Features known to be associated with specific cancers had relatively high importance in our initial prediction model. We further investigated various machine learning classifiers and feature selection methods to derive the ensemble-based cancer type prediction model achieving up to 84% classification accuracy in the nested 10-fold cross-validation. Finally, we narrowed down the target cancers to the six most common types and achieved up to 94% accuracy. Nature Publishing Group UK 2019-11-15 /pmc/articles/PMC6858312/ /pubmed/31729414 http://dx.doi.org/10.1038/s41598-019-53034-3 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Lee, Kanggeun
Jeong, Hyoung-oh
Lee, Semin
Jeong, Won-Ki
CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network
title CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network
title_full CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network
title_fullStr CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network
title_full_unstemmed CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network
title_short CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network
title_sort cpem: accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6858312/
https://www.ncbi.nlm.nih.gov/pubmed/31729414
http://dx.doi.org/10.1038/s41598-019-53034-3
work_keys_str_mv AT leekanggeun cpemaccuratecancertypeclassificationbasedonsomaticalterationsusinganensembleofarandomforestandadeepneuralnetwork
AT jeonghyoungoh cpemaccuratecancertypeclassificationbasedonsomaticalterationsusinganensembleofarandomforestandadeepneuralnetwork
AT leesemin cpemaccuratecancertypeclassificationbasedonsomaticalterationsusinganensembleofarandomforestandadeepneuralnetwork
AT jeongwonki cpemaccuratecancertypeclassificationbasedonsomaticalterationsusinganensembleofarandomforestandadeepneuralnetwork