Cargando…
Study of Integrated Heterogeneous Data Reveals Prognostic Power of Gene Expression for Breast Cancer Survival
BACKGROUND: Studies show that thousands of genes are associated with prognosis of breast cancer. Towards utilizing available genetic data, efforts have been made to predict outcomes using gene expression data, and a number of commercial products have been developed. These products have the following...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4344205/ https://www.ncbi.nlm.nih.gov/pubmed/25723490 http://dx.doi.org/10.1371/journal.pone.0117658 |
_version_ | 1782359378652823552 |
---|---|
author | Neapolitan, Richard E. Jiang, Xia |
author_facet | Neapolitan, Richard E. Jiang, Xia |
author_sort | Neapolitan, Richard E. |
collection | PubMed |
description | BACKGROUND: Studies show that thousands of genes are associated with prognosis of breast cancer. Towards utilizing available genetic data, efforts have been made to predict outcomes using gene expression data, and a number of commercial products have been developed. These products have the following shortcomings: 1) They use the Cox model for prediction. However, the RSF model has been shown to significantly outperform the Cox model. 2) Testing was not done to see if a complete set of clinical predictors could predict as well as the gene expression signatures. METHODOLOGY/FINDINGS: We address these shortcomings. The METABRIC data set concerns 1981 breast cancer tumors. Features include 21 clinical features, expression levels for 16,384 genes, and survival. We compare the survival prediction performance of the Cox model and the RSF model using the clinical data and the gene expression data to their performance using only the clinical data. We obtain significantly better results when we used both clinical data and gene expression data for 5 year, 10 year, and 15 year survival prediction. When we replace the gene expression data by PAM50 subtype, our results are significant only for 5 year and 15 year prediction. We obtain significantly better results using the RSF model over the Cox model. Finally, our results indicate that gene expression data alone may predict long-term survival. CONCLUSIONS/SIGNIFICANCE: Our results indicate that we can obtain improved survival prediction using clinical data and gene expression data compared to prediction using only clinical data. We further conclude that we can obtain improved survival prediction using the RSF model instead of the Cox model. These results are significant because by incorporating more gene expression data with clinical features and using the RSF model, we could develop decision support systems that better utilize heterogeneous information to improve outcome prediction and decision making. |
format | Online Article Text |
id | pubmed-4344205 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-43442052015-03-04 Study of Integrated Heterogeneous Data Reveals Prognostic Power of Gene Expression for Breast Cancer Survival Neapolitan, Richard E. Jiang, Xia PLoS One Research Article BACKGROUND: Studies show that thousands of genes are associated with prognosis of breast cancer. Towards utilizing available genetic data, efforts have been made to predict outcomes using gene expression data, and a number of commercial products have been developed. These products have the following shortcomings: 1) They use the Cox model for prediction. However, the RSF model has been shown to significantly outperform the Cox model. 2) Testing was not done to see if a complete set of clinical predictors could predict as well as the gene expression signatures. METHODOLOGY/FINDINGS: We address these shortcomings. The METABRIC data set concerns 1981 breast cancer tumors. Features include 21 clinical features, expression levels for 16,384 genes, and survival. We compare the survival prediction performance of the Cox model and the RSF model using the clinical data and the gene expression data to their performance using only the clinical data. We obtain significantly better results when we used both clinical data and gene expression data for 5 year, 10 year, and 15 year survival prediction. When we replace the gene expression data by PAM50 subtype, our results are significant only for 5 year and 15 year prediction. We obtain significantly better results using the RSF model over the Cox model. Finally, our results indicate that gene expression data alone may predict long-term survival. CONCLUSIONS/SIGNIFICANCE: Our results indicate that we can obtain improved survival prediction using clinical data and gene expression data compared to prediction using only clinical data. We further conclude that we can obtain improved survival prediction using the RSF model instead of the Cox model. These results are significant because by incorporating more gene expression data with clinical features and using the RSF model, we could develop decision support systems that better utilize heterogeneous information to improve outcome prediction and decision making. Public Library of Science 2015-02-27 /pmc/articles/PMC4344205/ /pubmed/25723490 http://dx.doi.org/10.1371/journal.pone.0117658 Text en © 2015 Neapolitan, Jiang http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Neapolitan, Richard E. Jiang, Xia Study of Integrated Heterogeneous Data Reveals Prognostic Power of Gene Expression for Breast Cancer Survival |
title | Study of Integrated Heterogeneous Data Reveals Prognostic Power of Gene Expression for Breast Cancer Survival |
title_full | Study of Integrated Heterogeneous Data Reveals Prognostic Power of Gene Expression for Breast Cancer Survival |
title_fullStr | Study of Integrated Heterogeneous Data Reveals Prognostic Power of Gene Expression for Breast Cancer Survival |
title_full_unstemmed | Study of Integrated Heterogeneous Data Reveals Prognostic Power of Gene Expression for Breast Cancer Survival |
title_short | Study of Integrated Heterogeneous Data Reveals Prognostic Power of Gene Expression for Breast Cancer Survival |
title_sort | study of integrated heterogeneous data reveals prognostic power of gene expression for breast cancer survival |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4344205/ https://www.ncbi.nlm.nih.gov/pubmed/25723490 http://dx.doi.org/10.1371/journal.pone.0117658 |
work_keys_str_mv | AT neapolitanricharde studyofintegratedheterogeneousdatarevealsprognosticpowerofgeneexpressionforbreastcancersurvival AT jiangxia studyofintegratedheterogeneousdatarevealsprognosticpowerofgeneexpressionforbreastcancersurvival |