Cargando…

Using logistic regression to improve the prognostic value of microarray gene expression data sets: application to early-stage squamous cell carcinoma of the lung and triple negative breast carcinoma

BACKGROUND: Numerous microarray-based prognostic gene expression signatures of primary neoplasms have been published but often with little concurrence between studies, thus limiting their clinical utility. We describe a methodology using logistic regression, which circumvents limitations of conventi...

Descripción completa

Detalles Bibliográficos
Autores principales: Mount, David W, Putnam, Charles W, Centouri, Sara M, Manziello, Ann M, Pandey, Ritu, Garland, Linda L, Martinez, Jesse D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4110620/
https://www.ncbi.nlm.nih.gov/pubmed/24916928
http://dx.doi.org/10.1186/1755-8794-7-33
_version_ 1782328013673725952
author Mount, David W
Putnam, Charles W
Centouri, Sara M
Manziello, Ann M
Pandey, Ritu
Garland, Linda L
Martinez, Jesse D
author_facet Mount, David W
Putnam, Charles W
Centouri, Sara M
Manziello, Ann M
Pandey, Ritu
Garland, Linda L
Martinez, Jesse D
author_sort Mount, David W
collection PubMed
description BACKGROUND: Numerous microarray-based prognostic gene expression signatures of primary neoplasms have been published but often with little concurrence between studies, thus limiting their clinical utility. We describe a methodology using logistic regression, which circumvents limitations of conventional Kaplan Meier analysis. We applied this approach to a thrice-analyzed and published squamous cell carcinoma (SQCC) of the lung data set, with the objective of identifying gene expressions predictive of early death versus long survival in early-stage disease. A similar analysis was applied to a data set of triple negative breast carcinoma cases, which present similar clinical challenges. METHODS: Important to our approach is the selection of homogenous patient groups for comparison. In the lung study, we selected two groups (including only stages I and II), equal in size, of earliest deaths and longest survivors. Genes varying at least four-fold were tested by logistic regression for accuracy of prediction (area under a ROC plot). The gene list was refined by applying two sliding-window analyses and by validations using a leave–one-out approach and model building with validation subsets. In the breast study, a similar logistic regression analysis was used after selecting appropriate cases for comparison. RESULTS: A total of 8594 variable genes were tested for accuracy in predicting earliest deaths versus longest survivors in SQCC. After applying the two sliding window and the leave-one-out analyses, 24 prognostic genes were identified; most of them were B-cell related. When the same data set of stage I and II cases was analyzed using a conventional Kaplan Meier (KM) approach, we identified fewer immune-related genes among the most statistically significant hits; when stage III cases were included, most of the prognostic genes were missed. Interestingly, logistic regression analysis of the breast cancer data set identified many immune-related genes predictive of clinical outcome. CONCLUSIONS: Stratification of cases based on clinical data, careful selection of two groups for comparison, and the application of logistic regression analysis substantially improved predictive accuracy in comparison to conventional KM approaches. B cell-related genes dominated the list of prognostic genes in early stage SQCC of the lung and triple negative breast cancer.
format Online
Article
Text
id pubmed-4110620
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41106202014-07-26 Using logistic regression to improve the prognostic value of microarray gene expression data sets: application to early-stage squamous cell carcinoma of the lung and triple negative breast carcinoma Mount, David W Putnam, Charles W Centouri, Sara M Manziello, Ann M Pandey, Ritu Garland, Linda L Martinez, Jesse D BMC Med Genomics Research Article BACKGROUND: Numerous microarray-based prognostic gene expression signatures of primary neoplasms have been published but often with little concurrence between studies, thus limiting their clinical utility. We describe a methodology using logistic regression, which circumvents limitations of conventional Kaplan Meier analysis. We applied this approach to a thrice-analyzed and published squamous cell carcinoma (SQCC) of the lung data set, with the objective of identifying gene expressions predictive of early death versus long survival in early-stage disease. A similar analysis was applied to a data set of triple negative breast carcinoma cases, which present similar clinical challenges. METHODS: Important to our approach is the selection of homogenous patient groups for comparison. In the lung study, we selected two groups (including only stages I and II), equal in size, of earliest deaths and longest survivors. Genes varying at least four-fold were tested by logistic regression for accuracy of prediction (area under a ROC plot). The gene list was refined by applying two sliding-window analyses and by validations using a leave–one-out approach and model building with validation subsets. In the breast study, a similar logistic regression analysis was used after selecting appropriate cases for comparison. RESULTS: A total of 8594 variable genes were tested for accuracy in predicting earliest deaths versus longest survivors in SQCC. After applying the two sliding window and the leave-one-out analyses, 24 prognostic genes were identified; most of them were B-cell related. When the same data set of stage I and II cases was analyzed using a conventional Kaplan Meier (KM) approach, we identified fewer immune-related genes among the most statistically significant hits; when stage III cases were included, most of the prognostic genes were missed. Interestingly, logistic regression analysis of the breast cancer data set identified many immune-related genes predictive of clinical outcome. CONCLUSIONS: Stratification of cases based on clinical data, careful selection of two groups for comparison, and the application of logistic regression analysis substantially improved predictive accuracy in comparison to conventional KM approaches. B cell-related genes dominated the list of prognostic genes in early stage SQCC of the lung and triple negative breast cancer. BioMed Central 2014-06-10 /pmc/articles/PMC4110620/ /pubmed/24916928 http://dx.doi.org/10.1186/1755-8794-7-33 Text en Copyright © 2014 Mount et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Mount, David W
Putnam, Charles W
Centouri, Sara M
Manziello, Ann M
Pandey, Ritu
Garland, Linda L
Martinez, Jesse D
Using logistic regression to improve the prognostic value of microarray gene expression data sets: application to early-stage squamous cell carcinoma of the lung and triple negative breast carcinoma
title Using logistic regression to improve the prognostic value of microarray gene expression data sets: application to early-stage squamous cell carcinoma of the lung and triple negative breast carcinoma
title_full Using logistic regression to improve the prognostic value of microarray gene expression data sets: application to early-stage squamous cell carcinoma of the lung and triple negative breast carcinoma
title_fullStr Using logistic regression to improve the prognostic value of microarray gene expression data sets: application to early-stage squamous cell carcinoma of the lung and triple negative breast carcinoma
title_full_unstemmed Using logistic regression to improve the prognostic value of microarray gene expression data sets: application to early-stage squamous cell carcinoma of the lung and triple negative breast carcinoma
title_short Using logistic regression to improve the prognostic value of microarray gene expression data sets: application to early-stage squamous cell carcinoma of the lung and triple negative breast carcinoma
title_sort using logistic regression to improve the prognostic value of microarray gene expression data sets: application to early-stage squamous cell carcinoma of the lung and triple negative breast carcinoma
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4110620/
https://www.ncbi.nlm.nih.gov/pubmed/24916928
http://dx.doi.org/10.1186/1755-8794-7-33
work_keys_str_mv AT mountdavidw usinglogisticregressiontoimprovetheprognosticvalueofmicroarraygeneexpressiondatasetsapplicationtoearlystagesquamouscellcarcinomaofthelungandtriplenegativebreastcarcinoma
AT putnamcharlesw usinglogisticregressiontoimprovetheprognosticvalueofmicroarraygeneexpressiondatasetsapplicationtoearlystagesquamouscellcarcinomaofthelungandtriplenegativebreastcarcinoma
AT centourisaram usinglogisticregressiontoimprovetheprognosticvalueofmicroarraygeneexpressiondatasetsapplicationtoearlystagesquamouscellcarcinomaofthelungandtriplenegativebreastcarcinoma
AT manzielloannm usinglogisticregressiontoimprovetheprognosticvalueofmicroarraygeneexpressiondatasetsapplicationtoearlystagesquamouscellcarcinomaofthelungandtriplenegativebreastcarcinoma
AT pandeyritu usinglogisticregressiontoimprovetheprognosticvalueofmicroarraygeneexpressiondatasetsapplicationtoearlystagesquamouscellcarcinomaofthelungandtriplenegativebreastcarcinoma
AT garlandlindal usinglogisticregressiontoimprovetheprognosticvalueofmicroarraygeneexpressiondatasetsapplicationtoearlystagesquamouscellcarcinomaofthelungandtriplenegativebreastcarcinoma
AT martinezjessed usinglogisticregressiontoimprovetheprognosticvalueofmicroarraygeneexpressiondatasetsapplicationtoearlystagesquamouscellcarcinomaofthelungandtriplenegativebreastcarcinoma