Cargando…

Prediction of Early Breast Cancer Metastasis from DNA Microarray Data Using High-Dimensional Cox Regression Models

BACKGROUND: DNA microarray studies identified gene expression signatures predictive of metastatic relapse in early breast cancer. Standard feature selection procedures applied to reduce the set of predictive genes did not take into account the correlation between genes. In this paper, we studied the...

Descripción completa

Detalles Bibliográficos
Autores principales: Zemmour, Christophe, Bertucci, François, Finetti, Pascal, Chetrit, Bernard, Birnbaum, Daniel, Filleron, Thomas, Boher, Jean-Marie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4426954/
https://www.ncbi.nlm.nih.gov/pubmed/25983547
http://dx.doi.org/10.4137/CIN.S17284
_version_ 1782370658517254144
author Zemmour, Christophe
Bertucci, François
Finetti, Pascal
Chetrit, Bernard
Birnbaum, Daniel
Filleron, Thomas
Boher, Jean-Marie
author_facet Zemmour, Christophe
Bertucci, François
Finetti, Pascal
Chetrit, Bernard
Birnbaum, Daniel
Filleron, Thomas
Boher, Jean-Marie
author_sort Zemmour, Christophe
collection PubMed
description BACKGROUND: DNA microarray studies identified gene expression signatures predictive of metastatic relapse in early breast cancer. Standard feature selection procedures applied to reduce the set of predictive genes did not take into account the correlation between genes. In this paper, we studied the performances of three high-dimensional regression methods – CoxBoost, LASSO (Least Absolute Shrinkage and Selection Operator), and Elastic net – to identify prognostic signatures in patients with early breast cancer. METHODS: We analyzed three public retrospective datasets, including a total of 384 patients with axillary lymph node-negative breast cancer. The Amsterdam van’t Veer’s training set of 78 patients was used to determine the optimal gene sets and classifiers using sensitivity thresholds resulting in mis-classification of no more than 10% of the poor-prognosis group. To ensure the comparability between different methods, an automatic selection procedure was used to determine the number of genes included in each model. The van de Vijver’s and Desmedt’s datasets were used as validation sets to evaluate separately the prognostic performances of our classifiers. The results were compared to the original Amsterdam 70-gene classifier. RESULTS: The automatic selection procedure reduced the number of predictive genes up to a minimum of six genes. In the two validation sets, the three models (Elastic net, LASSO, and CoxBoost) led to the definition of genomic classifiers predicting the 5-year metastatic status with similar performances, with respective 59, 56, and 54% accuracy, 83, 75, and 83% sensitivity, and 53, 52, and 48% specificity in the Desmedt’s dataset. In comparison, the Amsterdam 70-gene signature showed 45% accuracy, 97% sensitivity, and 34% specificity. The gene overlap and the classification concordance between the three classifiers were high. All the classifiers added significant prognostic information to that provided by the traditional prognostic factors and showed a very high overlap with respect to gene ontologies (GOs) associated with genes overexpressed in the predicted poor-prognosis vs. good-prognosis classes and centred on cell proliferation. Interestingly, all classifiers reported high sensitivity to predict the 4-year status of metastatic disease. CONCLUSIONS: High-dimensional regression methods are attractive in prognostic studies because finding a small subset of genes may facilitate the transfer to the clinic, and also because they strengthen the robustness of the model by limiting the selection of false-positive predictive genes. With only six genes, the CoxBoost classifier predicted the 4-year status of metastatic disease with 93% sensitivity. Selecting a few genes related to ontologies other than cell proliferation might further improve the overall sensitivity performance.
format Online
Article
Text
id pubmed-4426954
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-44269542015-05-15 Prediction of Early Breast Cancer Metastasis from DNA Microarray Data Using High-Dimensional Cox Regression Models Zemmour, Christophe Bertucci, François Finetti, Pascal Chetrit, Bernard Birnbaum, Daniel Filleron, Thomas Boher, Jean-Marie Cancer Inform Original Research BACKGROUND: DNA microarray studies identified gene expression signatures predictive of metastatic relapse in early breast cancer. Standard feature selection procedures applied to reduce the set of predictive genes did not take into account the correlation between genes. In this paper, we studied the performances of three high-dimensional regression methods – CoxBoost, LASSO (Least Absolute Shrinkage and Selection Operator), and Elastic net – to identify prognostic signatures in patients with early breast cancer. METHODS: We analyzed three public retrospective datasets, including a total of 384 patients with axillary lymph node-negative breast cancer. The Amsterdam van’t Veer’s training set of 78 patients was used to determine the optimal gene sets and classifiers using sensitivity thresholds resulting in mis-classification of no more than 10% of the poor-prognosis group. To ensure the comparability between different methods, an automatic selection procedure was used to determine the number of genes included in each model. The van de Vijver’s and Desmedt’s datasets were used as validation sets to evaluate separately the prognostic performances of our classifiers. The results were compared to the original Amsterdam 70-gene classifier. RESULTS: The automatic selection procedure reduced the number of predictive genes up to a minimum of six genes. In the two validation sets, the three models (Elastic net, LASSO, and CoxBoost) led to the definition of genomic classifiers predicting the 5-year metastatic status with similar performances, with respective 59, 56, and 54% accuracy, 83, 75, and 83% sensitivity, and 53, 52, and 48% specificity in the Desmedt’s dataset. In comparison, the Amsterdam 70-gene signature showed 45% accuracy, 97% sensitivity, and 34% specificity. The gene overlap and the classification concordance between the three classifiers were high. All the classifiers added significant prognostic information to that provided by the traditional prognostic factors and showed a very high overlap with respect to gene ontologies (GOs) associated with genes overexpressed in the predicted poor-prognosis vs. good-prognosis classes and centred on cell proliferation. Interestingly, all classifiers reported high sensitivity to predict the 4-year status of metastatic disease. CONCLUSIONS: High-dimensional regression methods are attractive in prognostic studies because finding a small subset of genes may facilitate the transfer to the clinic, and also because they strengthen the robustness of the model by limiting the selection of false-positive predictive genes. With only six genes, the CoxBoost classifier predicted the 4-year status of metastatic disease with 93% sensitivity. Selecting a few genes related to ontologies other than cell proliferation might further improve the overall sensitivity performance. Libertas Academica 2015-05-05 /pmc/articles/PMC4426954/ /pubmed/25983547 http://dx.doi.org/10.4137/CIN.S17284 Text en © 2015 the author(s), publisher and licensee Libertas Academica Limited This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.
spellingShingle Original Research
Zemmour, Christophe
Bertucci, François
Finetti, Pascal
Chetrit, Bernard
Birnbaum, Daniel
Filleron, Thomas
Boher, Jean-Marie
Prediction of Early Breast Cancer Metastasis from DNA Microarray Data Using High-Dimensional Cox Regression Models
title Prediction of Early Breast Cancer Metastasis from DNA Microarray Data Using High-Dimensional Cox Regression Models
title_full Prediction of Early Breast Cancer Metastasis from DNA Microarray Data Using High-Dimensional Cox Regression Models
title_fullStr Prediction of Early Breast Cancer Metastasis from DNA Microarray Data Using High-Dimensional Cox Regression Models
title_full_unstemmed Prediction of Early Breast Cancer Metastasis from DNA Microarray Data Using High-Dimensional Cox Regression Models
title_short Prediction of Early Breast Cancer Metastasis from DNA Microarray Data Using High-Dimensional Cox Regression Models
title_sort prediction of early breast cancer metastasis from dna microarray data using high-dimensional cox regression models
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4426954/
https://www.ncbi.nlm.nih.gov/pubmed/25983547
http://dx.doi.org/10.4137/CIN.S17284
work_keys_str_mv AT zemmourchristophe predictionofearlybreastcancermetastasisfromdnamicroarraydatausinghighdimensionalcoxregressionmodels
AT bertuccifrancois predictionofearlybreastcancermetastasisfromdnamicroarraydatausinghighdimensionalcoxregressionmodels
AT finettipascal predictionofearlybreastcancermetastasisfromdnamicroarraydatausinghighdimensionalcoxregressionmodels
AT chetritbernard predictionofearlybreastcancermetastasisfromdnamicroarraydatausinghighdimensionalcoxregressionmodels
AT birnbaumdaniel predictionofearlybreastcancermetastasisfromdnamicroarraydatausinghighdimensionalcoxregressionmodels
AT filleronthomas predictionofearlybreastcancermetastasisfromdnamicroarraydatausinghighdimensionalcoxregressionmodels
AT boherjeanmarie predictionofearlybreastcancermetastasisfromdnamicroarraydatausinghighdimensionalcoxregressionmodels