Cargando…

Data Quality: A Systematic Review of the Biosurveillance Literature

OBJECTIVE: To highlight how data quality has been discussed in the biosurveillance literature in order to identify current gaps in knowledge and areas for future research. INTRODUCTION: Data quality monitoring is necessary for accurate disease surveillance. However it can be challenging, especially...

Descripción completa

Detalles Bibliográficos
Autores principales: Reynolds, Tera, Painter, Ian, Streichert, Laura
Formato: Online Artículo Texto
Lenguaje:English
Publicado: University of Illinois at Chicago Library 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692854/
_version_ 1782274671450783744
author Reynolds, Tera
Painter, Ian
Streichert, Laura
author_facet Reynolds, Tera
Painter, Ian
Streichert, Laura
author_sort Reynolds, Tera
collection PubMed
description OBJECTIVE: To highlight how data quality has been discussed in the biosurveillance literature in order to identify current gaps in knowledge and areas for future research. INTRODUCTION: Data quality monitoring is necessary for accurate disease surveillance. However it can be challenging, especially when “real-time” data are required. Data quality has been broadly defined as the degree to which data are suitable for use by data consumers [1]. When compromised at any point in a health information system, data of low quality can impair the detection of data anomalies, delay the response to emerging health threats [2], and result in inefficient use of staff and financial resources. While the impacts of poor data quality on biosurveillance are largely unknown, and vary depending on field and business processes, the information management literature includes estimates for increased costs amounting to 8–12% of organizational revenue and, in general, poorer decisions that take longer to make [3]. METHODS: -How has data quality been defined and/or discussed? -What measurements of data quality have been utilized? -What methods for monitoring data quality have been utilized? -What methods have been used to mitigate data quality issues? -What steps have been taken to improve data quality? The search included PubMed, ISDS and AMIA Conference Proceedings, and reference lists. PubMed was searched using the terms “data quality,” “biosurveillance,” “information visualization,” “quality control,” “health data,” and “missing data.” The titles and abstracts of all search results were assessed for relevance and relevant articles were reviewed using the structured matrix. RESULTS: The completeness of data capture is the most commonly measured dimension of data quality discussed in the literature (other variables include timeliness and accuracy). The methods for detecting data quality issues fall into two broad categories: (1) methods for regular monitoring to identify data quality issues and (2) methods that are utilized for ad hoc assessments of data quality. Methods for regular monitoring of data quality are more likely to be automated and focused on visualization, compared with the methods described as part of special evaluations or studies, which tend to include more manual validation. Improving data quality involves the identification and correction of data errors that already exist in the system using either manual or automated data cleansing techniques [4]. Several methods of improving data quality were discussed in the public health surveillance literature, including development of an address verification algorithm that identifies an alternative, valid address [5], and manual correction of the contents of databases [6]. Communication with the data entry personnel or data providers, either on a regular basis (e.g., annual report) or when systematic data entry errors are identified, was mentioned in the literature as the most common step to prevent data quality issues. CONCLUSIONS: In reviewing the biosurveillance literature in the context of the data quality field, the largest gap appears to be that the data quality methods discussed in literature are often ad hoc and not consistently implemented. Developing a data quality program to identify the causes of lower quality health data, address data quality problems, and prevent issues would allow public health departments to more efficiently and effectively conduct biosurveillance and to apply results to improving public health practice.
format Online
Article
Text
id pubmed-3692854
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher University of Illinois at Chicago Library
record_format MEDLINE/PubMed
spelling pubmed-36928542013-06-26 Data Quality: A Systematic Review of the Biosurveillance Literature Reynolds, Tera Painter, Ian Streichert, Laura Online J Public Health Inform ISDS 2012 Conference Abstracts OBJECTIVE: To highlight how data quality has been discussed in the biosurveillance literature in order to identify current gaps in knowledge and areas for future research. INTRODUCTION: Data quality monitoring is necessary for accurate disease surveillance. However it can be challenging, especially when “real-time” data are required. Data quality has been broadly defined as the degree to which data are suitable for use by data consumers [1]. When compromised at any point in a health information system, data of low quality can impair the detection of data anomalies, delay the response to emerging health threats [2], and result in inefficient use of staff and financial resources. While the impacts of poor data quality on biosurveillance are largely unknown, and vary depending on field and business processes, the information management literature includes estimates for increased costs amounting to 8–12% of organizational revenue and, in general, poorer decisions that take longer to make [3]. METHODS: -How has data quality been defined and/or discussed? -What measurements of data quality have been utilized? -What methods for monitoring data quality have been utilized? -What methods have been used to mitigate data quality issues? -What steps have been taken to improve data quality? The search included PubMed, ISDS and AMIA Conference Proceedings, and reference lists. PubMed was searched using the terms “data quality,” “biosurveillance,” “information visualization,” “quality control,” “health data,” and “missing data.” The titles and abstracts of all search results were assessed for relevance and relevant articles were reviewed using the structured matrix. RESULTS: The completeness of data capture is the most commonly measured dimension of data quality discussed in the literature (other variables include timeliness and accuracy). The methods for detecting data quality issues fall into two broad categories: (1) methods for regular monitoring to identify data quality issues and (2) methods that are utilized for ad hoc assessments of data quality. Methods for regular monitoring of data quality are more likely to be automated and focused on visualization, compared with the methods described as part of special evaluations or studies, which tend to include more manual validation. Improving data quality involves the identification and correction of data errors that already exist in the system using either manual or automated data cleansing techniques [4]. Several methods of improving data quality were discussed in the public health surveillance literature, including development of an address verification algorithm that identifies an alternative, valid address [5], and manual correction of the contents of databases [6]. Communication with the data entry personnel or data providers, either on a regular basis (e.g., annual report) or when systematic data entry errors are identified, was mentioned in the literature as the most common step to prevent data quality issues. CONCLUSIONS: In reviewing the biosurveillance literature in the context of the data quality field, the largest gap appears to be that the data quality methods discussed in literature are often ad hoc and not consistently implemented. Developing a data quality program to identify the causes of lower quality health data, address data quality problems, and prevent issues would allow public health departments to more efficiently and effectively conduct biosurveillance and to apply results to improving public health practice. University of Illinois at Chicago Library 2013-04-04 /pmc/articles/PMC3692854/ Text en ©2013 the author(s) http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/ojphi/about/submissions#copyrightNotice This is an Open Access article. Authors own copyright of their articles appearing in the Online Journal of Public Health Informatics. Readers may copy articles without permission of the copyright owner(s), as long as the author and OJPHI are acknowledged in the copy and the copy is used for educational, not-for-profit purposes.
spellingShingle ISDS 2012 Conference Abstracts
Reynolds, Tera
Painter, Ian
Streichert, Laura
Data Quality: A Systematic Review of the Biosurveillance Literature
title Data Quality: A Systematic Review of the Biosurveillance Literature
title_full Data Quality: A Systematic Review of the Biosurveillance Literature
title_fullStr Data Quality: A Systematic Review of the Biosurveillance Literature
title_full_unstemmed Data Quality: A Systematic Review of the Biosurveillance Literature
title_short Data Quality: A Systematic Review of the Biosurveillance Literature
title_sort data quality: a systematic review of the biosurveillance literature
topic ISDS 2012 Conference Abstracts
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692854/
work_keys_str_mv AT reynoldstera dataqualityasystematicreviewofthebiosurveillanceliterature
AT painterian dataqualityasystematicreviewofthebiosurveillanceliterature
AT streichertlaura dataqualityasystematicreviewofthebiosurveillanceliterature