Cargando…

Validity of using multiple imputation for "unknown" stage at diagnosis in population-based cancer registry data

BACKGROUND: The multiple imputation approach to missing data has been validated by a number of simulation studies by artificially inducing missingness on fully observed stage data under a pre-specified missing data mechanism. However, the validity of multiple imputation has not yet been assessed usi...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Qingwei, Egger, Sam, Yu, Xue Qin, Smith, David P., O’Connell, Dianne L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5487067/
https://www.ncbi.nlm.nih.gov/pubmed/28654653
http://dx.doi.org/10.1371/journal.pone.0180033
_version_ 1783246384631119872
author Luo, Qingwei
Egger, Sam
Yu, Xue Qin
Smith, David P.
O’Connell, Dianne L.
author_facet Luo, Qingwei
Egger, Sam
Yu, Xue Qin
Smith, David P.
O’Connell, Dianne L.
author_sort Luo, Qingwei
collection PubMed
description BACKGROUND: The multiple imputation approach to missing data has been validated by a number of simulation studies by artificially inducing missingness on fully observed stage data under a pre-specified missing data mechanism. However, the validity of multiple imputation has not yet been assessed using real data. The objective of this study was to assess the validity of using multiple imputation for “unknown” prostate cancer stage recorded in the New South Wales Cancer Registry (NSWCR) in real-world conditions. METHODS: Data from the population-based cohort study NSW Prostate Cancer Care and Outcomes Study (PCOS) were linked to 2000–2002 NSWCR data. For cases with “unknown” NSWCR stage, PCOS-stage was extracted from clinical notes. Logistic regression was used to evaluate the missing at random assumption adjusted for variables from two imputation models: a basic model including NSWCR variables only and an enhanced model including the same NSWCR variables together with PCOS primary treatment. Cox regression was used to evaluate the performance of MI. RESULTS: Of the 1864 prostate cancer cases 32.7% were recorded as having “unknown” NSWCR stage. The missing at random assumption was satisfied when the logistic regression included the variables included in the enhanced model, but not those in the basic model only. The Cox models using data with imputed stage from either imputation model provided generally similar estimated hazard ratios but with wider confidence intervals compared with those derived from analysis of the data with PCOS-stage. However, the complete-case analysis of the data provided a considerably higher estimated hazard ratio for the low socio-economic status group and rural areas in comparison with those obtained from all other datasets. CONCLUSIONS: Using MI to deal with “unknown” stage data recorded in a population-based cancer registry appears to provide valid estimates. We would recommend a cautious approach to the use of this method elsewhere.
format Online
Article
Text
id pubmed-5487067
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-54870672017-07-11 Validity of using multiple imputation for "unknown" stage at diagnosis in population-based cancer registry data Luo, Qingwei Egger, Sam Yu, Xue Qin Smith, David P. O’Connell, Dianne L. PLoS One Research Article BACKGROUND: The multiple imputation approach to missing data has been validated by a number of simulation studies by artificially inducing missingness on fully observed stage data under a pre-specified missing data mechanism. However, the validity of multiple imputation has not yet been assessed using real data. The objective of this study was to assess the validity of using multiple imputation for “unknown” prostate cancer stage recorded in the New South Wales Cancer Registry (NSWCR) in real-world conditions. METHODS: Data from the population-based cohort study NSW Prostate Cancer Care and Outcomes Study (PCOS) were linked to 2000–2002 NSWCR data. For cases with “unknown” NSWCR stage, PCOS-stage was extracted from clinical notes. Logistic regression was used to evaluate the missing at random assumption adjusted for variables from two imputation models: a basic model including NSWCR variables only and an enhanced model including the same NSWCR variables together with PCOS primary treatment. Cox regression was used to evaluate the performance of MI. RESULTS: Of the 1864 prostate cancer cases 32.7% were recorded as having “unknown” NSWCR stage. The missing at random assumption was satisfied when the logistic regression included the variables included in the enhanced model, but not those in the basic model only. The Cox models using data with imputed stage from either imputation model provided generally similar estimated hazard ratios but with wider confidence intervals compared with those derived from analysis of the data with PCOS-stage. However, the complete-case analysis of the data provided a considerably higher estimated hazard ratio for the low socio-economic status group and rural areas in comparison with those obtained from all other datasets. CONCLUSIONS: Using MI to deal with “unknown” stage data recorded in a population-based cancer registry appears to provide valid estimates. We would recommend a cautious approach to the use of this method elsewhere. Public Library of Science 2017-06-27 /pmc/articles/PMC5487067/ /pubmed/28654653 http://dx.doi.org/10.1371/journal.pone.0180033 Text en © 2017 Luo et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Luo, Qingwei
Egger, Sam
Yu, Xue Qin
Smith, David P.
O’Connell, Dianne L.
Validity of using multiple imputation for "unknown" stage at diagnosis in population-based cancer registry data
title Validity of using multiple imputation for "unknown" stage at diagnosis in population-based cancer registry data
title_full Validity of using multiple imputation for "unknown" stage at diagnosis in population-based cancer registry data
title_fullStr Validity of using multiple imputation for "unknown" stage at diagnosis in population-based cancer registry data
title_full_unstemmed Validity of using multiple imputation for "unknown" stage at diagnosis in population-based cancer registry data
title_short Validity of using multiple imputation for "unknown" stage at diagnosis in population-based cancer registry data
title_sort validity of using multiple imputation for "unknown" stage at diagnosis in population-based cancer registry data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5487067/
https://www.ncbi.nlm.nih.gov/pubmed/28654653
http://dx.doi.org/10.1371/journal.pone.0180033
work_keys_str_mv AT luoqingwei validityofusingmultipleimputationforunknownstageatdiagnosisinpopulationbasedcancerregistrydata
AT eggersam validityofusingmultipleimputationforunknownstageatdiagnosisinpopulationbasedcancerregistrydata
AT yuxueqin validityofusingmultipleimputationforunknownstageatdiagnosisinpopulationbasedcancerregistrydata
AT smithdavidp validityofusingmultipleimputationforunknownstageatdiagnosisinpopulationbasedcancerregistrydata
AT oconnelldiannel validityofusingmultipleimputationforunknownstageatdiagnosisinpopulationbasedcancerregistrydata