Cargando…
Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer
BACKGROUND: Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are avai...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3049587/ https://www.ncbi.nlm.nih.gov/pubmed/21266980 http://dx.doi.org/10.1038/sj.bjc.6606078 |
_version_ | 1782199245091110912 |
---|---|
author | Ali, A M G Dawson, S-J Blows, F M Provenzano, E Ellis, I O Baglietto, L Huntsman, D Caldas, C Pharoah, P D |
author_facet | Ali, A M G Dawson, S-J Blows, F M Provenzano, E Ellis, I O Baglietto, L Huntsman, D Caldas, C Pharoah, P D |
author_sort | Ali, A M G |
collection | PubMed |
description | BACKGROUND: Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are available. However, the validity of the various approaches is dependent on the structure of the missing data and there are few empirical studies dealing with missing data from molecular pathology. The purpose of this study was to investigate the results of four commonly used approaches to handling missing data from a large, multi-centre study of the molecular pathological determinants of prognosis in breast cancer. PATIENTS AND METHODS: We pooled data from over 11 000 cases of invasive breast cancer from five studies that collected information on seven prognostic indicators together with survival time data. We compared the results of a multi-variate Cox regression using four approaches to handling missing data – complete case analysis (CCA), mean substitution (MS) and multiple imputation without inclusion of the outcome (MI−) and multiple imputation with inclusion of the outcome (MI+). We also performed an analysis in which missing data were simulated under different assumptions and the results of the four methods were compared. RESULTS: Over half the cases had missing data on at least one of the seven variables and 11 percent had missing data on 4 or more. The multi-variate hazard ratio estimates based on multiple imputation models were very similar to those derived after using MS, with similar standard errors. Hazard ratio estimates based on the CCA were only slightly different, but the estimates were less precise as the standard errors were large. However, in data simulated to be missing completely at random (MCAR) or missing at random (MAR), estimates for MI+ were least biased and most accurate, whereas estimates for CCA were most biased and least accurate. CONCLUSION: In this study, empirical results from analyses using CCA, MS, MI− and MI+ were similar, although results from CCA were less precise. The results from simulations suggest that in general MI+ is likely to be the best. Given the ease of implementing MI in standard statistical software, the results of MI+ and CCA should be compared in any multi-variate analysis where missing data are a problem. |
format | Text |
id | pubmed-3049587 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-30495872012-02-15 Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer Ali, A M G Dawson, S-J Blows, F M Provenzano, E Ellis, I O Baglietto, L Huntsman, D Caldas, C Pharoah, P D Br J Cancer Molecular Diagnostics BACKGROUND: Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are available. However, the validity of the various approaches is dependent on the structure of the missing data and there are few empirical studies dealing with missing data from molecular pathology. The purpose of this study was to investigate the results of four commonly used approaches to handling missing data from a large, multi-centre study of the molecular pathological determinants of prognosis in breast cancer. PATIENTS AND METHODS: We pooled data from over 11 000 cases of invasive breast cancer from five studies that collected information on seven prognostic indicators together with survival time data. We compared the results of a multi-variate Cox regression using four approaches to handling missing data – complete case analysis (CCA), mean substitution (MS) and multiple imputation without inclusion of the outcome (MI−) and multiple imputation with inclusion of the outcome (MI+). We also performed an analysis in which missing data were simulated under different assumptions and the results of the four methods were compared. RESULTS: Over half the cases had missing data on at least one of the seven variables and 11 percent had missing data on 4 or more. The multi-variate hazard ratio estimates based on multiple imputation models were very similar to those derived after using MS, with similar standard errors. Hazard ratio estimates based on the CCA were only slightly different, but the estimates were less precise as the standard errors were large. However, in data simulated to be missing completely at random (MCAR) or missing at random (MAR), estimates for MI+ were least biased and most accurate, whereas estimates for CCA were most biased and least accurate. CONCLUSION: In this study, empirical results from analyses using CCA, MS, MI− and MI+ were similar, although results from CCA were less precise. The results from simulations suggest that in general MI+ is likely to be the best. Given the ease of implementing MI in standard statistical software, the results of MI+ and CCA should be compared in any multi-variate analysis where missing data are a problem. Nature Publishing Group 2011-02-15 2011-01-25 /pmc/articles/PMC3049587/ /pubmed/21266980 http://dx.doi.org/10.1038/sj.bjc.6606078 Text en Copyright © 2011 Cancer Research UK https://creativecommons.org/licenses/by/4.0/This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material.If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Molecular Diagnostics Ali, A M G Dawson, S-J Blows, F M Provenzano, E Ellis, I O Baglietto, L Huntsman, D Caldas, C Pharoah, P D Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer |
title | Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer |
title_full | Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer |
title_fullStr | Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer |
title_full_unstemmed | Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer |
title_short | Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer |
title_sort | comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer |
topic | Molecular Diagnostics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3049587/ https://www.ncbi.nlm.nih.gov/pubmed/21266980 http://dx.doi.org/10.1038/sj.bjc.6606078 |
work_keys_str_mv | AT aliamg comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer AT dawsonsj comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer AT blowsfm comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer AT provenzanoe comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer AT ellisio comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer AT bagliettol comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer AT huntsmand comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer AT caldasc comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer AT pharoahpd comparisonofmethodsforhandlingmissingdataonimmunohistochemicalmarkersinsurvivalanalysisofbreastcancer |