Cargando…
Classification of breast cancer patients using somatic mutation profiles and machine learning approaches
BACKGROUND: The high degree of heterogeneity observed in breast cancers makes it very difficult to classify the cancer patients into distinct clinical subgroups and consequently limits the ability to devise effective therapeutic strategies. Several classification strategies based on ER/PR/HER2 expre...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009820/ https://www.ncbi.nlm.nih.gov/pubmed/27587275 http://dx.doi.org/10.1186/s12918-016-0306-z |
_version_ | 1782451582668898304 |
---|---|
author | Vural, Suleyman Wang, Xiaosheng Guda, Chittibabu |
author_facet | Vural, Suleyman Wang, Xiaosheng Guda, Chittibabu |
author_sort | Vural, Suleyman |
collection | PubMed |
description | BACKGROUND: The high degree of heterogeneity observed in breast cancers makes it very difficult to classify the cancer patients into distinct clinical subgroups and consequently limits the ability to devise effective therapeutic strategies. Several classification strategies based on ER/PR/HER2 expression or the expression profiles of a panel of genes have helped, but such methods often produce misleading results due to their dynamic nature. In contrast, somatic DNA mutations are relatively stable and lead to initiation and progression of many sporadic cancers. Hence in this study, we explore the use of gene mutation profiles to classify, characterize and predict the subgroups of breast cancers. RESULTS: We analyzed the whole exome sequencing data from 358 ethnically similar breast cancer patients in The Cancer Genome Atlas (TCGA) project. Somatic and non-synonymous single nucleotide variants identified from each patient were assigned a quantitative score (C-score) that represents the extent of negative impact on the gene function. Using these scores with non-negative matrix factorization method, we clustered the patients into three subgroups. By comparing the clinical stage of patients, we identified an early-stage-enriched and a late-stage-enriched subgroup. Comparison of the mutation scores of early and late-stage-enriched subgroups identified 358 genes that carry significantly higher mutations rates in the late stage subgroup. Functional characterization of these genes revealed important functional gene families that carry a heavy mutational load in the late state rich subgroup of patients. Finally, using the identified subgroups, we also developed a supervised classification model to predict the stage of the patients. CONCLUSIONS: This study demonstrates that gene mutation profiles can be effectively used with unsupervised machine-learning methods to identify clinically distinguishable breast cancer subgroups. The classification model developed in this method could provide a reasonable prediction of the cancer patients’ stage solely based on their mutation profiles. This study represents the first use of only somatic mutation profile data to identify and predict breast cancer subgroups and this generic methodology can also be applied to other cancer datasets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12918-016-0306-z) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5009820 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-50098202016-09-09 Classification of breast cancer patients using somatic mutation profiles and machine learning approaches Vural, Suleyman Wang, Xiaosheng Guda, Chittibabu BMC Syst Biol Research BACKGROUND: The high degree of heterogeneity observed in breast cancers makes it very difficult to classify the cancer patients into distinct clinical subgroups and consequently limits the ability to devise effective therapeutic strategies. Several classification strategies based on ER/PR/HER2 expression or the expression profiles of a panel of genes have helped, but such methods often produce misleading results due to their dynamic nature. In contrast, somatic DNA mutations are relatively stable and lead to initiation and progression of many sporadic cancers. Hence in this study, we explore the use of gene mutation profiles to classify, characterize and predict the subgroups of breast cancers. RESULTS: We analyzed the whole exome sequencing data from 358 ethnically similar breast cancer patients in The Cancer Genome Atlas (TCGA) project. Somatic and non-synonymous single nucleotide variants identified from each patient were assigned a quantitative score (C-score) that represents the extent of negative impact on the gene function. Using these scores with non-negative matrix factorization method, we clustered the patients into three subgroups. By comparing the clinical stage of patients, we identified an early-stage-enriched and a late-stage-enriched subgroup. Comparison of the mutation scores of early and late-stage-enriched subgroups identified 358 genes that carry significantly higher mutations rates in the late stage subgroup. Functional characterization of these genes revealed important functional gene families that carry a heavy mutational load in the late state rich subgroup of patients. Finally, using the identified subgroups, we also developed a supervised classification model to predict the stage of the patients. CONCLUSIONS: This study demonstrates that gene mutation profiles can be effectively used with unsupervised machine-learning methods to identify clinically distinguishable breast cancer subgroups. The classification model developed in this method could provide a reasonable prediction of the cancer patients’ stage solely based on their mutation profiles. This study represents the first use of only somatic mutation profile data to identify and predict breast cancer subgroups and this generic methodology can also be applied to other cancer datasets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12918-016-0306-z) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-26 /pmc/articles/PMC5009820/ /pubmed/27587275 http://dx.doi.org/10.1186/s12918-016-0306-z Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Vural, Suleyman Wang, Xiaosheng Guda, Chittibabu Classification of breast cancer patients using somatic mutation profiles and machine learning approaches |
title | Classification of breast cancer patients using somatic mutation profiles and machine learning approaches |
title_full | Classification of breast cancer patients using somatic mutation profiles and machine learning approaches |
title_fullStr | Classification of breast cancer patients using somatic mutation profiles and machine learning approaches |
title_full_unstemmed | Classification of breast cancer patients using somatic mutation profiles and machine learning approaches |
title_short | Classification of breast cancer patients using somatic mutation profiles and machine learning approaches |
title_sort | classification of breast cancer patients using somatic mutation profiles and machine learning approaches |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009820/ https://www.ncbi.nlm.nih.gov/pubmed/27587275 http://dx.doi.org/10.1186/s12918-016-0306-z |
work_keys_str_mv | AT vuralsuleyman classificationofbreastcancerpatientsusingsomaticmutationprofilesandmachinelearningapproaches AT wangxiaosheng classificationofbreastcancerpatientsusingsomaticmutationprofilesandmachinelearningapproaches AT gudachittibabu classificationofbreastcancerpatientsusingsomaticmutationprofilesandmachinelearningapproaches |