Cargando…

Evaluation of freely available data profiling tools for health data research application: a functional evaluation review

OBJECTIVES: To objectively evaluate freely available data profiling software tools using healthcare data. DESIGN: Data profiling tools were evaluated for their capabilities using publicly available information and data sheets. From initial assessment, several underwent further detailed evaluation fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Gordon, Ben, Fennessy, Clara, Varma, Susheel, Barrett, Jake, McCondochie, Enez, Heritage, Trevor, Duroe, Oenone, Jeffery, Richard, Rajamani, Vishnu, Earlam, Kieran, Banda, Victor, Sebire, Neil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9086620/
https://www.ncbi.nlm.nih.gov/pubmed/35534084
http://dx.doi.org/10.1136/bmjopen-2021-054186
_version_ 1784704045231898624
author Gordon, Ben
Fennessy, Clara
Varma, Susheel
Barrett, Jake
McCondochie, Enez
Heritage, Trevor
Duroe, Oenone
Jeffery, Richard
Rajamani, Vishnu
Earlam, Kieran
Banda, Victor
Sebire, Neil
author_facet Gordon, Ben
Fennessy, Clara
Varma, Susheel
Barrett, Jake
McCondochie, Enez
Heritage, Trevor
Duroe, Oenone
Jeffery, Richard
Rajamani, Vishnu
Earlam, Kieran
Banda, Victor
Sebire, Neil
author_sort Gordon, Ben
collection PubMed
description OBJECTIVES: To objectively evaluate freely available data profiling software tools using healthcare data. DESIGN: Data profiling tools were evaluated for their capabilities using publicly available information and data sheets. From initial assessment, several underwent further detailed evaluation for application on healthcare data using a synthetic dataset of 1000 patients and associated data using a common health data model, and tools scored based on their functionality with this dataset. SETTING: Improving the quality of healthcare data for research use is a priority. Profiling tools can assist by evaluating datasets across a range of quality dimensions. Several freely available software packages with profiling capabilities are available but healthcare organisations often have limited data engineering capability and expertise. PARTICIPANTS: 28 profiling tools, 8 undergoing evaluation on synthetic dataset of 1000 patients. RESULTS: Of 28 potential profiling tools initially identified, 8 showed high potential for applicability with healthcare datasets based on available documentation, of which two performed consistently well for these purposes across multiple tasks including determination of completeness, consistency, uniqueness, validity, accuracy and provision of distribution metrics. CONCLUSIONS: Numerous freely available profiling tools are serviceable for potential use with health datasets, of which at least two demonstrated high performance across a range of technical data quality dimensions based on testing with synthetic health dataset and common data model. The appropriate tool choice depends on factors including underlying organisational infrastructure, level of data engineering and coding expertise, but there are freely available tools helping profile health datasets for research use and inform curation activity.
format Online
Article
Text
id pubmed-9086620
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-90866202022-05-20 Evaluation of freely available data profiling tools for health data research application: a functional evaluation review Gordon, Ben Fennessy, Clara Varma, Susheel Barrett, Jake McCondochie, Enez Heritage, Trevor Duroe, Oenone Jeffery, Richard Rajamani, Vishnu Earlam, Kieran Banda, Victor Sebire, Neil BMJ Open Health Informatics OBJECTIVES: To objectively evaluate freely available data profiling software tools using healthcare data. DESIGN: Data profiling tools were evaluated for their capabilities using publicly available information and data sheets. From initial assessment, several underwent further detailed evaluation for application on healthcare data using a synthetic dataset of 1000 patients and associated data using a common health data model, and tools scored based on their functionality with this dataset. SETTING: Improving the quality of healthcare data for research use is a priority. Profiling tools can assist by evaluating datasets across a range of quality dimensions. Several freely available software packages with profiling capabilities are available but healthcare organisations often have limited data engineering capability and expertise. PARTICIPANTS: 28 profiling tools, 8 undergoing evaluation on synthetic dataset of 1000 patients. RESULTS: Of 28 potential profiling tools initially identified, 8 showed high potential for applicability with healthcare datasets based on available documentation, of which two performed consistently well for these purposes across multiple tasks including determination of completeness, consistency, uniqueness, validity, accuracy and provision of distribution metrics. CONCLUSIONS: Numerous freely available profiling tools are serviceable for potential use with health datasets, of which at least two demonstrated high performance across a range of technical data quality dimensions based on testing with synthetic health dataset and common data model. The appropriate tool choice depends on factors including underlying organisational infrastructure, level of data engineering and coding expertise, but there are freely available tools helping profile health datasets for research use and inform curation activity. BMJ Publishing Group 2022-05-09 /pmc/articles/PMC9086620/ /pubmed/35534084 http://dx.doi.org/10.1136/bmjopen-2021-054186 Text en © Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY. Published by BMJ. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.
spellingShingle Health Informatics
Gordon, Ben
Fennessy, Clara
Varma, Susheel
Barrett, Jake
McCondochie, Enez
Heritage, Trevor
Duroe, Oenone
Jeffery, Richard
Rajamani, Vishnu
Earlam, Kieran
Banda, Victor
Sebire, Neil
Evaluation of freely available data profiling tools for health data research application: a functional evaluation review
title Evaluation of freely available data profiling tools for health data research application: a functional evaluation review
title_full Evaluation of freely available data profiling tools for health data research application: a functional evaluation review
title_fullStr Evaluation of freely available data profiling tools for health data research application: a functional evaluation review
title_full_unstemmed Evaluation of freely available data profiling tools for health data research application: a functional evaluation review
title_short Evaluation of freely available data profiling tools for health data research application: a functional evaluation review
title_sort evaluation of freely available data profiling tools for health data research application: a functional evaluation review
topic Health Informatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9086620/
https://www.ncbi.nlm.nih.gov/pubmed/35534084
http://dx.doi.org/10.1136/bmjopen-2021-054186
work_keys_str_mv AT gordonben evaluationoffreelyavailabledataprofilingtoolsforhealthdataresearchapplicationafunctionalevaluationreview
AT fennessyclara evaluationoffreelyavailabledataprofilingtoolsforhealthdataresearchapplicationafunctionalevaluationreview
AT varmasusheel evaluationoffreelyavailabledataprofilingtoolsforhealthdataresearchapplicationafunctionalevaluationreview
AT barrettjake evaluationoffreelyavailabledataprofilingtoolsforhealthdataresearchapplicationafunctionalevaluationreview
AT mccondochieenez evaluationoffreelyavailabledataprofilingtoolsforhealthdataresearchapplicationafunctionalevaluationreview
AT heritagetrevor evaluationoffreelyavailabledataprofilingtoolsforhealthdataresearchapplicationafunctionalevaluationreview
AT duroeoenone evaluationoffreelyavailabledataprofilingtoolsforhealthdataresearchapplicationafunctionalevaluationreview
AT jefferyrichard evaluationoffreelyavailabledataprofilingtoolsforhealthdataresearchapplicationafunctionalevaluationreview
AT rajamanivishnu evaluationoffreelyavailabledataprofilingtoolsforhealthdataresearchapplicationafunctionalevaluationreview
AT earlamkieran evaluationoffreelyavailabledataprofilingtoolsforhealthdataresearchapplicationafunctionalevaluationreview
AT bandavictor evaluationoffreelyavailabledataprofilingtoolsforhealthdataresearchapplicationafunctionalevaluationreview
AT sebireneil evaluationoffreelyavailabledataprofilingtoolsforhealthdataresearchapplicationafunctionalevaluationreview