Cargando…

Classification of breast cancer patients using somatic mutation profiles and machine learning approaches

BACKGROUND: The high degree of heterogeneity observed in breast cancers makes it very difficult to classify the cancer patients into distinct clinical subgroups and consequently limits the ability to devise effective therapeutic strategies. Several classification strategies based on ER/PR/HER2 expre...

Descripción completa

Detalles Bibliográficos
Autores principales: Vural, Suleyman, Wang, Xiaosheng, Guda, Chittibabu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009820/
https://www.ncbi.nlm.nih.gov/pubmed/27587275
http://dx.doi.org/10.1186/s12918-016-0306-z
_version_ 1782451582668898304
author Vural, Suleyman
Wang, Xiaosheng
Guda, Chittibabu
author_facet Vural, Suleyman
Wang, Xiaosheng
Guda, Chittibabu
author_sort Vural, Suleyman
collection PubMed
description BACKGROUND: The high degree of heterogeneity observed in breast cancers makes it very difficult to classify the cancer patients into distinct clinical subgroups and consequently limits the ability to devise effective therapeutic strategies. Several classification strategies based on ER/PR/HER2 expression or the expression profiles of a panel of genes have helped, but such methods often produce misleading results due to their dynamic nature. In contrast, somatic DNA mutations are relatively stable and lead to initiation and progression of many sporadic cancers. Hence in this study, we explore the use of gene mutation profiles to classify, characterize and predict the subgroups of breast cancers. RESULTS: We analyzed the whole exome sequencing data from 358 ethnically similar breast cancer patients in The Cancer Genome Atlas (TCGA) project. Somatic and non-synonymous single nucleotide variants identified from each patient were assigned a quantitative score (C-score) that represents the extent of negative impact on the gene function. Using these scores with non-negative matrix factorization method, we clustered the patients into three subgroups. By comparing the clinical stage of patients, we identified an early-stage-enriched and a late-stage-enriched subgroup. Comparison of the mutation scores of early and late-stage-enriched subgroups identified 358 genes that carry significantly higher mutations rates in the late stage subgroup. Functional characterization of these genes revealed important functional gene families that carry a heavy mutational load in the late state rich subgroup of patients. Finally, using the identified subgroups, we also developed a supervised classification model to predict the stage of the patients. CONCLUSIONS: This study demonstrates that gene mutation profiles can be effectively used with unsupervised machine-learning methods to identify clinically distinguishable breast cancer subgroups. The classification model developed in this method could provide a reasonable prediction of the cancer patients’ stage solely based on their mutation profiles. This study represents the first use of only somatic mutation profile data to identify and predict breast cancer subgroups and this generic methodology can also be applied to other cancer datasets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12918-016-0306-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5009820
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50098202016-09-09 Classification of breast cancer patients using somatic mutation profiles and machine learning approaches Vural, Suleyman Wang, Xiaosheng Guda, Chittibabu BMC Syst Biol Research BACKGROUND: The high degree of heterogeneity observed in breast cancers makes it very difficult to classify the cancer patients into distinct clinical subgroups and consequently limits the ability to devise effective therapeutic strategies. Several classification strategies based on ER/PR/HER2 expression or the expression profiles of a panel of genes have helped, but such methods often produce misleading results due to their dynamic nature. In contrast, somatic DNA mutations are relatively stable and lead to initiation and progression of many sporadic cancers. Hence in this study, we explore the use of gene mutation profiles to classify, characterize and predict the subgroups of breast cancers. RESULTS: We analyzed the whole exome sequencing data from 358 ethnically similar breast cancer patients in The Cancer Genome Atlas (TCGA) project. Somatic and non-synonymous single nucleotide variants identified from each patient were assigned a quantitative score (C-score) that represents the extent of negative impact on the gene function. Using these scores with non-negative matrix factorization method, we clustered the patients into three subgroups. By comparing the clinical stage of patients, we identified an early-stage-enriched and a late-stage-enriched subgroup. Comparison of the mutation scores of early and late-stage-enriched subgroups identified 358 genes that carry significantly higher mutations rates in the late stage subgroup. Functional characterization of these genes revealed important functional gene families that carry a heavy mutational load in the late state rich subgroup of patients. Finally, using the identified subgroups, we also developed a supervised classification model to predict the stage of the patients. CONCLUSIONS: This study demonstrates that gene mutation profiles can be effectively used with unsupervised machine-learning methods to identify clinically distinguishable breast cancer subgroups. The classification model developed in this method could provide a reasonable prediction of the cancer patients’ stage solely based on their mutation profiles. This study represents the first use of only somatic mutation profile data to identify and predict breast cancer subgroups and this generic methodology can also be applied to other cancer datasets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12918-016-0306-z) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-26 /pmc/articles/PMC5009820/ /pubmed/27587275 http://dx.doi.org/10.1186/s12918-016-0306-z Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Vural, Suleyman
Wang, Xiaosheng
Guda, Chittibabu
Classification of breast cancer patients using somatic mutation profiles and machine learning approaches
title Classification of breast cancer patients using somatic mutation profiles and machine learning approaches
title_full Classification of breast cancer patients using somatic mutation profiles and machine learning approaches
title_fullStr Classification of breast cancer patients using somatic mutation profiles and machine learning approaches
title_full_unstemmed Classification of breast cancer patients using somatic mutation profiles and machine learning approaches
title_short Classification of breast cancer patients using somatic mutation profiles and machine learning approaches
title_sort classification of breast cancer patients using somatic mutation profiles and machine learning approaches
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009820/
https://www.ncbi.nlm.nih.gov/pubmed/27587275
http://dx.doi.org/10.1186/s12918-016-0306-z
work_keys_str_mv AT vuralsuleyman classificationofbreastcancerpatientsusingsomaticmutationprofilesandmachinelearningapproaches
AT wangxiaosheng classificationofbreastcancerpatientsusingsomaticmutationprofilesandmachinelearningapproaches
AT gudachittibabu classificationofbreastcancerpatientsusingsomaticmutationprofilesandmachinelearningapproaches