Cargando…

Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study

BACKGROUND: Data cleaning is an important quality assurance in data linkage research studies. This paper presents the data cleaning and preparation process for a large-scale cross-jurisdictional Australian study (the Smoking MUMS Study) to evaluate the utilisation and safety of smoking cessation pha...

Descripción completa

Detalles Bibliográficos
Autores principales: Tran, Duong Thuy, Havard, Alys, Jorm, Louisa R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5504784/
https://www.ncbi.nlm.nih.gov/pubmed/28693435
http://dx.doi.org/10.1186/s12874-017-0385-6
_version_ 1783249346904457216
author Tran, Duong Thuy
Havard, Alys
Jorm, Louisa R.
author_facet Tran, Duong Thuy
Havard, Alys
Jorm, Louisa R.
author_sort Tran, Duong Thuy
collection PubMed
description BACKGROUND: Data cleaning is an important quality assurance in data linkage research studies. This paper presents the data cleaning and preparation process for a large-scale cross-jurisdictional Australian study (the Smoking MUMS Study) to evaluate the utilisation and safety of smoking cessation pharmacotherapies during pregnancy. METHODS: Perinatal records for all deliveries (2003–2012) in the States of New South Wales (NSW) and Western Australia were linked to State-based data collections including hospital separation, emergency department and death data (mothers and babies) and congenital defect notifications (babies in NSW) by State-based data linkage units. A national data linkage unit linked pharmaceutical dispensing data for the mothers. All linkages were probabilistic. Twenty two steps assessed the uniqueness of records and consistency of items within and across data sources, resolved discrepancies in the linkages between units, and identified women having records in both States. RESULTS: State-based linkages yielded a cohort of 783,471 mothers and 1,232,440 babies. Likely false positive links relating to 3703 mothers were identified. Corrections of baby’s date of birth and age, and parity were made for 43,578 records while 1996 records were flagged as duplicates. Checks for the uniqueness of the matches between State and national linkages detected 3404 ID clusters, suggestive of missed links in the State linkages, and identified 1986 women who had records in both States. CONCLUSIONS: Analysis of content data can identify inaccurate links that cannot be detected by data linkage units that have access to personal identifiers only. Perinatal researchers are encouraged to adopt the methods presented to ensure quality and consistency among studies using linked administrative data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12874-017-0385-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5504784
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55047842017-07-12 Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study Tran, Duong Thuy Havard, Alys Jorm, Louisa R. BMC Med Res Methodol Research Article BACKGROUND: Data cleaning is an important quality assurance in data linkage research studies. This paper presents the data cleaning and preparation process for a large-scale cross-jurisdictional Australian study (the Smoking MUMS Study) to evaluate the utilisation and safety of smoking cessation pharmacotherapies during pregnancy. METHODS: Perinatal records for all deliveries (2003–2012) in the States of New South Wales (NSW) and Western Australia were linked to State-based data collections including hospital separation, emergency department and death data (mothers and babies) and congenital defect notifications (babies in NSW) by State-based data linkage units. A national data linkage unit linked pharmaceutical dispensing data for the mothers. All linkages were probabilistic. Twenty two steps assessed the uniqueness of records and consistency of items within and across data sources, resolved discrepancies in the linkages between units, and identified women having records in both States. RESULTS: State-based linkages yielded a cohort of 783,471 mothers and 1,232,440 babies. Likely false positive links relating to 3703 mothers were identified. Corrections of baby’s date of birth and age, and parity were made for 43,578 records while 1996 records were flagged as duplicates. Checks for the uniqueness of the matches between State and national linkages detected 3404 ID clusters, suggestive of missed links in the State linkages, and identified 1986 women who had records in both States. CONCLUSIONS: Analysis of content data can identify inaccurate links that cannot be detected by data linkage units that have access to personal identifiers only. Perinatal researchers are encouraged to adopt the methods presented to ensure quality and consistency among studies using linked administrative data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12874-017-0385-6) contains supplementary material, which is available to authorized users. BioMed Central 2017-07-11 /pmc/articles/PMC5504784/ /pubmed/28693435 http://dx.doi.org/10.1186/s12874-017-0385-6 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Tran, Duong Thuy
Havard, Alys
Jorm, Louisa R.
Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study
title Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study
title_full Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study
title_fullStr Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study
title_full_unstemmed Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study
title_short Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study
title_sort data cleaning and management protocols for linked perinatal research data: a good practice example from the smoking mums (maternal use of medications and safety) study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5504784/
https://www.ncbi.nlm.nih.gov/pubmed/28693435
http://dx.doi.org/10.1186/s12874-017-0385-6
work_keys_str_mv AT tranduongthuy datacleaningandmanagementprotocolsforlinkedperinatalresearchdataagoodpracticeexamplefromthesmokingmumsmaternaluseofmedicationsandsafetystudy
AT havardalys datacleaningandmanagementprotocolsforlinkedperinatalresearchdataagoodpracticeexamplefromthesmokingmumsmaternaluseofmedicationsandsafetystudy
AT jormlouisar datacleaningandmanagementprotocolsforlinkedperinatalresearchdataagoodpracticeexamplefromthesmokingmumsmaternaluseofmedicationsandsafetystudy