Cargando…

Analysis of erroneous data entries in paper based and electronic data collection

OBJECTIVE: Electronic data collection (EDC) has become a suitable alternative to paper based data collection (PBDC) in biomedical research even in resource poor settings. During a survey in Nepal, data were collected using both systems and data entry errors compared between both methods. Collected d...

Descripción completa

Detalles Bibliográficos
Autores principales: Ley, Benedikt, Rijal, Komal Raj, Marfurt, Jutta, Adhikari, Naba Raj, Banjara, Megha Raj, Shrestha, Upendra Thapa, Thriemer, Kamala, Price, Ric N., Ghimire, Prakash
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6704619/
https://www.ncbi.nlm.nih.gov/pubmed/31439025
http://dx.doi.org/10.1186/s13104-019-4574-8
_version_ 1783445535826378752
author Ley, Benedikt
Rijal, Komal Raj
Marfurt, Jutta
Adhikari, Naba Raj
Banjara, Megha Raj
Shrestha, Upendra Thapa
Thriemer, Kamala
Price, Ric N.
Ghimire, Prakash
author_facet Ley, Benedikt
Rijal, Komal Raj
Marfurt, Jutta
Adhikari, Naba Raj
Banjara, Megha Raj
Shrestha, Upendra Thapa
Thriemer, Kamala
Price, Ric N.
Ghimire, Prakash
author_sort Ley, Benedikt
collection PubMed
description OBJECTIVE: Electronic data collection (EDC) has become a suitable alternative to paper based data collection (PBDC) in biomedical research even in resource poor settings. During a survey in Nepal, data were collected using both systems and data entry errors compared between both methods. Collected data were checked for completeness, values outside of realistic ranges, internal logic and date variables for reasonable time frames. Variables were grouped into 5 categories and the number of discordant entries were compared between both systems, overall and per variable category. RESULTS: Data from 52 variables collected from 358 participants were available. Discrepancies between both data sets were found in 12.6% of all entries (2352/18,616). Differences between data points were identified in 18.0% (643/3580) of continuous variables, 15.8% of time variables (113/716), 13.0% of date variables (140/1074), 12.0% of text variables (86/716), and 10.9% of categorical variables (1370/12,530). Overall 64% (1499/2352) of all discrepancies were due to data omissions, 76.6% (1148/1499) of missing entries were among categorical data. Omissions in PBDC (n = 1002) were twice as frequent as in EDC (n = 497, p < 0.001). Data omissions, specifically among categorical variables were identified as the greatest source of error. If designed accordingly, EDC can address this short fall effectively. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13104-019-4574-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6704619
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-67046192019-08-22 Analysis of erroneous data entries in paper based and electronic data collection Ley, Benedikt Rijal, Komal Raj Marfurt, Jutta Adhikari, Naba Raj Banjara, Megha Raj Shrestha, Upendra Thapa Thriemer, Kamala Price, Ric N. Ghimire, Prakash BMC Res Notes Research Note OBJECTIVE: Electronic data collection (EDC) has become a suitable alternative to paper based data collection (PBDC) in biomedical research even in resource poor settings. During a survey in Nepal, data were collected using both systems and data entry errors compared between both methods. Collected data were checked for completeness, values outside of realistic ranges, internal logic and date variables for reasonable time frames. Variables were grouped into 5 categories and the number of discordant entries were compared between both systems, overall and per variable category. RESULTS: Data from 52 variables collected from 358 participants were available. Discrepancies between both data sets were found in 12.6% of all entries (2352/18,616). Differences between data points were identified in 18.0% (643/3580) of continuous variables, 15.8% of time variables (113/716), 13.0% of date variables (140/1074), 12.0% of text variables (86/716), and 10.9% of categorical variables (1370/12,530). Overall 64% (1499/2352) of all discrepancies were due to data omissions, 76.6% (1148/1499) of missing entries were among categorical data. Omissions in PBDC (n = 1002) were twice as frequent as in EDC (n = 497, p < 0.001). Data omissions, specifically among categorical variables were identified as the greatest source of error. If designed accordingly, EDC can address this short fall effectively. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13104-019-4574-8) contains supplementary material, which is available to authorized users. BioMed Central 2019-08-22 /pmc/articles/PMC6704619/ /pubmed/31439025 http://dx.doi.org/10.1186/s13104-019-4574-8 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Note
Ley, Benedikt
Rijal, Komal Raj
Marfurt, Jutta
Adhikari, Naba Raj
Banjara, Megha Raj
Shrestha, Upendra Thapa
Thriemer, Kamala
Price, Ric N.
Ghimire, Prakash
Analysis of erroneous data entries in paper based and electronic data collection
title Analysis of erroneous data entries in paper based and electronic data collection
title_full Analysis of erroneous data entries in paper based and electronic data collection
title_fullStr Analysis of erroneous data entries in paper based and electronic data collection
title_full_unstemmed Analysis of erroneous data entries in paper based and electronic data collection
title_short Analysis of erroneous data entries in paper based and electronic data collection
title_sort analysis of erroneous data entries in paper based and electronic data collection
topic Research Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6704619/
https://www.ncbi.nlm.nih.gov/pubmed/31439025
http://dx.doi.org/10.1186/s13104-019-4574-8
work_keys_str_mv AT leybenedikt analysisoferroneousdataentriesinpaperbasedandelectronicdatacollection
AT rijalkomalraj analysisoferroneousdataentriesinpaperbasedandelectronicdatacollection
AT marfurtjutta analysisoferroneousdataentriesinpaperbasedandelectronicdatacollection
AT adhikarinabaraj analysisoferroneousdataentriesinpaperbasedandelectronicdatacollection
AT banjaramegharaj analysisoferroneousdataentriesinpaperbasedandelectronicdatacollection
AT shresthaupendrathapa analysisoferroneousdataentriesinpaperbasedandelectronicdatacollection
AT thriemerkamala analysisoferroneousdataentriesinpaperbasedandelectronicdatacollection
AT pricericn analysisoferroneousdataentriesinpaperbasedandelectronicdatacollection
AT ghimireprakash analysisoferroneousdataentriesinpaperbasedandelectronicdatacollection