Cargando…

Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer

BACKGROUND: Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are avai...

Descripción completa

Detalles Bibliográficos
Autores principales: Ali, A M G, Dawson, S-J, Blows, F M, Provenzano, E, Ellis, I O, Baglietto, L, Huntsman, D, Caldas, C, Pharoah, P D
Formato: Texto
Lenguaje:English
Publicado: Nature Publishing Group 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3049587/
https://www.ncbi.nlm.nih.gov/pubmed/21266980
http://dx.doi.org/10.1038/sj.bjc.6606078
_version_ 1782199245091110912
author Ali, A M G
Dawson, S-J
Blows, F M
Provenzano, E
Ellis, I O
Baglietto, L
Huntsman, D
Caldas, C
Pharoah, P D
author_facet Ali, A M G
Dawson, S-J
Blows, F M
Provenzano, E
Ellis, I O
Baglietto, L
Huntsman, D
Caldas, C
Pharoah, P D
author_sort Ali, A M G
collection PubMed
description BACKGROUND: Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are available. However, the validity of the various approaches is dependent on the structure of the missing data and there are few empirical studies dealing with missing data from molecular pathology. The purpose of this study was to investigate the results of four commonly used approaches to handling missing data from a large, multi-centre study of the molecular pathological determinants of prognosis in breast cancer. PATIENTS AND METHODS: We pooled data from over 11 000 cases of invasive breast cancer from five studies that collected information on seven prognostic indicators together with survival time data. We compared the results of a multi-variate Cox regression using four approaches to handling missing data – complete case analysis (CCA), mean substitution (MS) and multiple imputation without inclusion of the outcome (MI−) and multiple imputation with inclusion of the outcome (MI+). We also performed an analysis in which missing data were simulated under different assumptions and the results of the four methods were compared. RESULTS: Over half the cases had missing data on at least one of the seven variables and 11 percent had missing data on 4 or more. The multi-variate hazard ratio estimates based on multiple imputation models were very similar to those derived after using MS, with similar standard errors. Hazard ratio estimates based on the CCA were only slightly different, but the estimates were less precise as the standard errors were large. However, in data simulated to be missing completely at random (MCAR) or missing at random (MAR), estimates for MI+ were least biased and most accurate, whereas estimates for CCA were most biased and least accurate. CONCLUSION: In this study, empirical results from analyses using CCA, MS, MI− and MI+ were similar, although results from CCA were less precise. The results from simulations suggest that in general MI+ is likely to be the best. Given the ease of implementing MI in standard statistical software, the results of MI+ and CCA should be compared in any multi-variate analysis where missing data are a problem.
format Text
id pubmed-3049587
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-30495872012-02-15 Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer Ali, A M G Dawson, S-J Blows, F M Provenzano, E Ellis, I O Baglietto, L Huntsman, D Caldas, C Pharoah, P D Br J Cancer Molecular Diagnostics BACKGROUND: Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are available. However, the validity of the various approaches is dependent on the structure of the missing data and there are few empirical studies dealing with missing data from molecular pathology. The purpose of this study was to investigate the results of four commonly used approaches to handling missing data from a large, multi-centre study of the molecular pathological determinants of prognosis in breast cancer. PATIENTS AND METHODS: We pooled data from over 11 000 cases of invasive breast cancer from five studies that collected information on seven prognostic indicators together with survival time data. We compared the results of a multi-variate Cox regression using four approaches to handling missing data – complete case analysis (CCA), mean substitution (MS) and multiple imputation without inclusion of the outcome (MI−) and multiple imputation with inclusion of the outcome (MI+). We also performed an analysis in which missing data were simulated under different assumptions and the results of the four methods were compared. RESULTS: Over half the cases had missing data on at least one of the seven variables and 11 percent had missing data on 4 or more. The multi-variate hazard ratio estimates based on multiple imputation models were very similar to those derived after using MS, with similar standard errors. Hazard ratio estimates based on the CCA were only slightly different, but the estimates were less precise as the standard errors were large. However, in data simulated to be missing completely at random (MCAR) or missing at random (MAR), estimates for MI+ were least biased and most accurate, whereas estimates for CCA were most biased and least accurate. CONCLUSION: In this study, empirical results from analyses using CCA, MS, MI− and MI+ were similar, although results from CCA were less precise. The results from simulations suggest that in general MI+ is likely to be the best. Given the ease of implementing MI in standard statistical software, the results of MI+ and CCA should be compared in any multi-variate analysis where missing data are a problem. Nature Publishing Group 2011-02-15 2011-01-25 /pmc/articles/PMC3049587/ /pubmed/21266980 http://dx.doi.org/10.1038/sj.bjc.6606078 Text en Copyright © 2011 Cancer Research UK https://creativecommons.org/licenses/by/4.0/This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material.If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/.
spellingShingle Molecular Diagnostics
Ali, A M G
Dawson, S-J
Blows, F M
Provenzano, E
Ellis, I O
Baglietto, L
Huntsman, D
Caldas, C
Pharoah, P D
Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer
title Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer
title_full Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer
title_fullStr Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer
title_full_unstemmed Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer
title_short Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer
title_sort comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer
topic Molecular Diagnostics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3049587/
https://www.ncbi.nlm.nih.gov/pubmed/21266980
http://dx.doi.org/10.1038/sj.bjc.6606078
work_keys_str_mv AT aliamg comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer
AT dawsonsj comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer
AT blowsfm comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer
AT provenzanoe comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer
AT ellisio comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer
AT bagliettol comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer
AT huntsmand comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer
AT caldasc comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer
AT pharoahpd comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer