Cargando…

Methods for dealing with discrepant records in linked population health datasets: a cross-sectional study

BACKGROUND: Linked population health data are increasingly used in epidemiological studies. If data items are reported on more than one dataset, data linkage can reduce the under-ascertainment associated with many population health datasets. However, this raises the possibility of discrepant case re...

Descripción completa

Detalles Bibliográficos
Autores principales: Roberts, Christine L, Algert, Charles S, Ford, Jane B
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1797010/
https://www.ncbi.nlm.nih.gov/pubmed/17261198
http://dx.doi.org/10.1186/1472-6963-7-12
_version_ 1782132280747098112
author Roberts, Christine L
Algert, Charles S
Ford, Jane B
author_facet Roberts, Christine L
Algert, Charles S
Ford, Jane B
author_sort Roberts, Christine L
collection PubMed
description BACKGROUND: Linked population health data are increasingly used in epidemiological studies. If data items are reported on more than one dataset, data linkage can reduce the under-ascertainment associated with many population health datasets. However, this raises the possibility of discrepant case reports from different datasets. METHODS: We examined the effect of four methods of classifying discrepant reports from different population health datasets on the estimated prevalence of hypertensive disorders of pregnancy and on the adjusted odds ratios (aOR) for known risk factors. Data were obtained from linked, validated, birth and hospital data for women who gave birth in a New South Wales hospital (Australia) 2000–2002. RESULTS: Among 250173 women with linked data, 238412 (95.3%) women had perfect agreement on the occurrence of hypertension, 1577 (0.6%) had imperfect agreement; 9369 (3.7%) had hypertension reported in only one dataset (under-reporting) and 815 (0.3%) had conflicting types of hypertension. Using only perfect agreement between birth and discharge data resulted in the lowest prevalence rates (0.3% chronic, 5.1% pregnancy hypertension), while including all reports resulted in the highest prevalence rates (1.1 % chronic, 8.7% pregnancy hypertension). The higher prevalence rates were generally consistent with international reports. In contrast, perfect agreement gave the highest aOR (95% confidence interval) for known risk factors: risk of chronic hypertension for maternal age ≥40 years was 4.0 (2.9, 5.3) and the risk of pregnancy hypertension for multiple birth was 2.8 (2.5, 3.2). CONCLUSION: The method chosen for classifying discrepant case reports should vary depending on the study question; all reports should be used as part of calculating the range of prevalence estimates, but perfect matches may be best suited to risk factor analyses. These findings are likely to be applicable to the linkage of any specialised health services datasets to population data that include information on diagnoses or procedures.
format Text
id pubmed-1797010
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-17970102007-02-13 Methods for dealing with discrepant records in linked population health datasets: a cross-sectional study Roberts, Christine L Algert, Charles S Ford, Jane B BMC Health Serv Res Research Article BACKGROUND: Linked population health data are increasingly used in epidemiological studies. If data items are reported on more than one dataset, data linkage can reduce the under-ascertainment associated with many population health datasets. However, this raises the possibility of discrepant case reports from different datasets. METHODS: We examined the effect of four methods of classifying discrepant reports from different population health datasets on the estimated prevalence of hypertensive disorders of pregnancy and on the adjusted odds ratios (aOR) for known risk factors. Data were obtained from linked, validated, birth and hospital data for women who gave birth in a New South Wales hospital (Australia) 2000–2002. RESULTS: Among 250173 women with linked data, 238412 (95.3%) women had perfect agreement on the occurrence of hypertension, 1577 (0.6%) had imperfect agreement; 9369 (3.7%) had hypertension reported in only one dataset (under-reporting) and 815 (0.3%) had conflicting types of hypertension. Using only perfect agreement between birth and discharge data resulted in the lowest prevalence rates (0.3% chronic, 5.1% pregnancy hypertension), while including all reports resulted in the highest prevalence rates (1.1 % chronic, 8.7% pregnancy hypertension). The higher prevalence rates were generally consistent with international reports. In contrast, perfect agreement gave the highest aOR (95% confidence interval) for known risk factors: risk of chronic hypertension for maternal age ≥40 years was 4.0 (2.9, 5.3) and the risk of pregnancy hypertension for multiple birth was 2.8 (2.5, 3.2). CONCLUSION: The method chosen for classifying discrepant case reports should vary depending on the study question; all reports should be used as part of calculating the range of prevalence estimates, but perfect matches may be best suited to risk factor analyses. These findings are likely to be applicable to the linkage of any specialised health services datasets to population data that include information on diagnoses or procedures. BioMed Central 2007-01-30 /pmc/articles/PMC1797010/ /pubmed/17261198 http://dx.doi.org/10.1186/1472-6963-7-12 Text en Copyright © 2007 Roberts et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Roberts, Christine L
Algert, Charles S
Ford, Jane B
Methods for dealing with discrepant records in linked population health datasets: a cross-sectional study
title Methods for dealing with discrepant records in linked population health datasets: a cross-sectional study
title_full Methods for dealing with discrepant records in linked population health datasets: a cross-sectional study
title_fullStr Methods for dealing with discrepant records in linked population health datasets: a cross-sectional study
title_full_unstemmed Methods for dealing with discrepant records in linked population health datasets: a cross-sectional study
title_short Methods for dealing with discrepant records in linked population health datasets: a cross-sectional study
title_sort methods for dealing with discrepant records in linked population health datasets: a cross-sectional study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1797010/
https://www.ncbi.nlm.nih.gov/pubmed/17261198
http://dx.doi.org/10.1186/1472-6963-7-12
work_keys_str_mv AT robertschristinel methodsfordealingwithdiscrepantrecordsinlinkedpopulationhealthdatasetsacrosssectionalstudy
AT algertcharless methodsfordealingwithdiscrepantrecordsinlinkedpopulationhealthdatasetsacrosssectionalstudy
AT fordjaneb methodsfordealingwithdiscrepantrecordsinlinkedpopulationhealthdatasetsacrosssectionalstudy