Cargando…

Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations

An accurate classification of human cancer, including its primary site, is important for better understanding of cancer and effective therapeutic strategies development. The available big data of somatic mutations provides us a great opportunity to investigate cancer classification using machine lea...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Yukun, Sun, Jingchun, Huang, Liang-Chin, Xu, Hua, Zhao, Zhongming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4619847/
https://www.ncbi.nlm.nih.gov/pubmed/26539502
http://dx.doi.org/10.1155/2015/491502
_version_ 1782397196657754112
author Chen, Yukun
Sun, Jingchun
Huang, Liang-Chin
Xu, Hua
Zhao, Zhongming
author_facet Chen, Yukun
Sun, Jingchun
Huang, Liang-Chin
Xu, Hua
Zhao, Zhongming
author_sort Chen, Yukun
collection PubMed
description An accurate classification of human cancer, including its primary site, is important for better understanding of cancer and effective therapeutic strategies development. The available big data of somatic mutations provides us a great opportunity to investigate cancer classification using machine learning. Here, we explored the patterns of 1,760,846 somatic mutations identified from 230,255 cancer patients along with gene function information using support vector machine. Specifically, we performed a multiclass classification experiment over the 17 tumor sites using the gene symbol, somatic mutation, chromosome, and gene functional pathway as predictors for 6,751 subjects. The performance of the baseline using only gene features is 0.57 in accuracy. It was improved to 0.62 when adding the information of mutation and chromosome. Among the predictable primary tumor sites, the prediction of five primary sites (large intestine, liver, skin, pancreas, and lung) could achieve the performance with more than 0.70 in F-measure. The model of the large intestine ranked the first with 0.87 in F-measure. The results demonstrate that the somatic mutation information is useful for prediction of primary tumor sites with machine learning modeling. To our knowledge, this study is the first investigation of the primary sites classification using machine learning and somatic mutation data.
format Online
Article
Text
id pubmed-4619847
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-46198472015-11-04 Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations Chen, Yukun Sun, Jingchun Huang, Liang-Chin Xu, Hua Zhao, Zhongming Biomed Res Int Research Article An accurate classification of human cancer, including its primary site, is important for better understanding of cancer and effective therapeutic strategies development. The available big data of somatic mutations provides us a great opportunity to investigate cancer classification using machine learning. Here, we explored the patterns of 1,760,846 somatic mutations identified from 230,255 cancer patients along with gene function information using support vector machine. Specifically, we performed a multiclass classification experiment over the 17 tumor sites using the gene symbol, somatic mutation, chromosome, and gene functional pathway as predictors for 6,751 subjects. The performance of the baseline using only gene features is 0.57 in accuracy. It was improved to 0.62 when adding the information of mutation and chromosome. Among the predictable primary tumor sites, the prediction of five primary sites (large intestine, liver, skin, pancreas, and lung) could achieve the performance with more than 0.70 in F-measure. The model of the large intestine ranked the first with 0.87 in F-measure. The results demonstrate that the somatic mutation information is useful for prediction of primary tumor sites with machine learning modeling. To our knowledge, this study is the first investigation of the primary sites classification using machine learning and somatic mutation data. Hindawi Publishing Corporation 2015 2015-10-11 /pmc/articles/PMC4619847/ /pubmed/26539502 http://dx.doi.org/10.1155/2015/491502 Text en Copyright © 2015 Yukun Chen et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Chen, Yukun
Sun, Jingchun
Huang, Liang-Chin
Xu, Hua
Zhao, Zhongming
Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations
title Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations
title_full Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations
title_fullStr Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations
title_full_unstemmed Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations
title_short Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations
title_sort classification of cancer primary sites using machine learning and somatic mutations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4619847/
https://www.ncbi.nlm.nih.gov/pubmed/26539502
http://dx.doi.org/10.1155/2015/491502
work_keys_str_mv AT chenyukun classificationofcancerprimarysitesusingmachinelearningandsomaticmutations
AT sunjingchun classificationofcancerprimarysitesusingmachinelearningandsomaticmutations
AT huangliangchin classificationofcancerprimarysitesusingmachinelearningandsomaticmutations
AT xuhua classificationofcancerprimarysitesusingmachinelearningandsomaticmutations
AT zhaozhongming classificationofcancerprimarysitesusingmachinelearningandsomaticmutations