Cargando…

rEHR: An R package for manipulating and analysing Electronic Health Record data

Research with structured Electronic Health Records (EHRs) is expanding as data becomes more accessible; analytic methods advance; and the scientific validity of such studies is increasingly accepted. However, data science methodology to enable the rapid searching/extraction, cleaning and analysis of...

Descripción completa

Detalles Bibliográficos
Autores principales: Springate, David A., Parisi, Rosa, Olier, Ivan, Reeves, David, Kontopantelis, Evangelos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5323003/
https://www.ncbi.nlm.nih.gov/pubmed/28231289
http://dx.doi.org/10.1371/journal.pone.0171784
_version_ 1782509954822832128
author Springate, David A.
Parisi, Rosa
Olier, Ivan
Reeves, David
Kontopantelis, Evangelos
author_facet Springate, David A.
Parisi, Rosa
Olier, Ivan
Reeves, David
Kontopantelis, Evangelos
author_sort Springate, David A.
collection PubMed
description Research with structured Electronic Health Records (EHRs) is expanding as data becomes more accessible; analytic methods advance; and the scientific validity of such studies is increasingly accepted. However, data science methodology to enable the rapid searching/extraction, cleaning and analysis of these large, often complex, datasets is less well developed. In addition, commonly used software is inadequate, resulting in bottlenecks in research workflows and in obstacles to increased transparency and reproducibility of the research. Preparing a research-ready dataset from EHRs is a complex and time consuming task requiring substantial data science skills, even for simple designs. In addition, certain aspects of the workflow are computationally intensive, for example extraction of longitudinal data and matching controls to a large cohort, which may take days or even weeks to run using standard software. The rEHR package simplifies and accelerates the process of extracting ready-for-analysis datasets from EHR databases. It has a simple import function to a database backend that greatly accelerates data access times. A set of generic query functions allow users to extract data efficiently without needing detailed knowledge of SQL queries. Longitudinal data extractions can also be made in a single command, making use of parallel processing. The package also contains functions for cutting data by time-varying covariates, matching controls to cases, unit conversion and construction of clinical code lists. There are also functions to synthesise dummy EHR. The package has been tested with one for the largest primary care EHRs, the Clinical Practice Research Datalink (CPRD), but allows for a common interface to other EHRs. This simplified and accelerated work flow for EHR data extraction results in simpler, cleaner scripts that are more easily debugged, shared and reproduced.
format Online
Article
Text
id pubmed-5323003
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-53230032017-03-09 rEHR: An R package for manipulating and analysing Electronic Health Record data Springate, David A. Parisi, Rosa Olier, Ivan Reeves, David Kontopantelis, Evangelos PLoS One Research Article Research with structured Electronic Health Records (EHRs) is expanding as data becomes more accessible; analytic methods advance; and the scientific validity of such studies is increasingly accepted. However, data science methodology to enable the rapid searching/extraction, cleaning and analysis of these large, often complex, datasets is less well developed. In addition, commonly used software is inadequate, resulting in bottlenecks in research workflows and in obstacles to increased transparency and reproducibility of the research. Preparing a research-ready dataset from EHRs is a complex and time consuming task requiring substantial data science skills, even for simple designs. In addition, certain aspects of the workflow are computationally intensive, for example extraction of longitudinal data and matching controls to a large cohort, which may take days or even weeks to run using standard software. The rEHR package simplifies and accelerates the process of extracting ready-for-analysis datasets from EHR databases. It has a simple import function to a database backend that greatly accelerates data access times. A set of generic query functions allow users to extract data efficiently without needing detailed knowledge of SQL queries. Longitudinal data extractions can also be made in a single command, making use of parallel processing. The package also contains functions for cutting data by time-varying covariates, matching controls to cases, unit conversion and construction of clinical code lists. There are also functions to synthesise dummy EHR. The package has been tested with one for the largest primary care EHRs, the Clinical Practice Research Datalink (CPRD), but allows for a common interface to other EHRs. This simplified and accelerated work flow for EHR data extraction results in simpler, cleaner scripts that are more easily debugged, shared and reproduced. Public Library of Science 2017-02-23 /pmc/articles/PMC5323003/ /pubmed/28231289 http://dx.doi.org/10.1371/journal.pone.0171784 Text en © 2017 Springate et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Springate, David A.
Parisi, Rosa
Olier, Ivan
Reeves, David
Kontopantelis, Evangelos
rEHR: An R package for manipulating and analysing Electronic Health Record data
title rEHR: An R package for manipulating and analysing Electronic Health Record data
title_full rEHR: An R package for manipulating and analysing Electronic Health Record data
title_fullStr rEHR: An R package for manipulating and analysing Electronic Health Record data
title_full_unstemmed rEHR: An R package for manipulating and analysing Electronic Health Record data
title_short rEHR: An R package for manipulating and analysing Electronic Health Record data
title_sort rehr: an r package for manipulating and analysing electronic health record data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5323003/
https://www.ncbi.nlm.nih.gov/pubmed/28231289
http://dx.doi.org/10.1371/journal.pone.0171784
work_keys_str_mv AT springatedavida rehranrpackageformanipulatingandanalysingelectronichealthrecorddata
AT parisirosa rehranrpackageformanipulatingandanalysingelectronichealthrecorddata
AT olierivan rehranrpackageformanipulatingandanalysingelectronichealthrecorddata
AT reevesdavid rehranrpackageformanipulatingandanalysingelectronichealthrecorddata
AT kontopantelisevangelos rehranrpackageformanipulatingandanalysingelectronichealthrecorddata