Cargando…
rEHR: An R package for manipulating and analysing Electronic Health Record data
Research with structured Electronic Health Records (EHRs) is expanding as data becomes more accessible; analytic methods advance; and the scientific validity of such studies is increasingly accepted. However, data science methodology to enable the rapid searching/extraction, cleaning and analysis of...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5323003/ https://www.ncbi.nlm.nih.gov/pubmed/28231289 http://dx.doi.org/10.1371/journal.pone.0171784 |
_version_ | 1782509954822832128 |
---|---|
author | Springate, David A. Parisi, Rosa Olier, Ivan Reeves, David Kontopantelis, Evangelos |
author_facet | Springate, David A. Parisi, Rosa Olier, Ivan Reeves, David Kontopantelis, Evangelos |
author_sort | Springate, David A. |
collection | PubMed |
description | Research with structured Electronic Health Records (EHRs) is expanding as data becomes more accessible; analytic methods advance; and the scientific validity of such studies is increasingly accepted. However, data science methodology to enable the rapid searching/extraction, cleaning and analysis of these large, often complex, datasets is less well developed. In addition, commonly used software is inadequate, resulting in bottlenecks in research workflows and in obstacles to increased transparency and reproducibility of the research. Preparing a research-ready dataset from EHRs is a complex and time consuming task requiring substantial data science skills, even for simple designs. In addition, certain aspects of the workflow are computationally intensive, for example extraction of longitudinal data and matching controls to a large cohort, which may take days or even weeks to run using standard software. The rEHR package simplifies and accelerates the process of extracting ready-for-analysis datasets from EHR databases. It has a simple import function to a database backend that greatly accelerates data access times. A set of generic query functions allow users to extract data efficiently without needing detailed knowledge of SQL queries. Longitudinal data extractions can also be made in a single command, making use of parallel processing. The package also contains functions for cutting data by time-varying covariates, matching controls to cases, unit conversion and construction of clinical code lists. There are also functions to synthesise dummy EHR. The package has been tested with one for the largest primary care EHRs, the Clinical Practice Research Datalink (CPRD), but allows for a common interface to other EHRs. This simplified and accelerated work flow for EHR data extraction results in simpler, cleaner scripts that are more easily debugged, shared and reproduced. |
format | Online Article Text |
id | pubmed-5323003 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-53230032017-03-09 rEHR: An R package for manipulating and analysing Electronic Health Record data Springate, David A. Parisi, Rosa Olier, Ivan Reeves, David Kontopantelis, Evangelos PLoS One Research Article Research with structured Electronic Health Records (EHRs) is expanding as data becomes more accessible; analytic methods advance; and the scientific validity of such studies is increasingly accepted. However, data science methodology to enable the rapid searching/extraction, cleaning and analysis of these large, often complex, datasets is less well developed. In addition, commonly used software is inadequate, resulting in bottlenecks in research workflows and in obstacles to increased transparency and reproducibility of the research. Preparing a research-ready dataset from EHRs is a complex and time consuming task requiring substantial data science skills, even for simple designs. In addition, certain aspects of the workflow are computationally intensive, for example extraction of longitudinal data and matching controls to a large cohort, which may take days or even weeks to run using standard software. The rEHR package simplifies and accelerates the process of extracting ready-for-analysis datasets from EHR databases. It has a simple import function to a database backend that greatly accelerates data access times. A set of generic query functions allow users to extract data efficiently without needing detailed knowledge of SQL queries. Longitudinal data extractions can also be made in a single command, making use of parallel processing. The package also contains functions for cutting data by time-varying covariates, matching controls to cases, unit conversion and construction of clinical code lists. There are also functions to synthesise dummy EHR. The package has been tested with one for the largest primary care EHRs, the Clinical Practice Research Datalink (CPRD), but allows for a common interface to other EHRs. This simplified and accelerated work flow for EHR data extraction results in simpler, cleaner scripts that are more easily debugged, shared and reproduced. Public Library of Science 2017-02-23 /pmc/articles/PMC5323003/ /pubmed/28231289 http://dx.doi.org/10.1371/journal.pone.0171784 Text en © 2017 Springate et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Springate, David A. Parisi, Rosa Olier, Ivan Reeves, David Kontopantelis, Evangelos rEHR: An R package for manipulating and analysing Electronic Health Record data |
title | rEHR: An R package for manipulating and analysing Electronic Health Record data |
title_full | rEHR: An R package for manipulating and analysing Electronic Health Record data |
title_fullStr | rEHR: An R package for manipulating and analysing Electronic Health Record data |
title_full_unstemmed | rEHR: An R package for manipulating and analysing Electronic Health Record data |
title_short | rEHR: An R package for manipulating and analysing Electronic Health Record data |
title_sort | rehr: an r package for manipulating and analysing electronic health record data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5323003/ https://www.ncbi.nlm.nih.gov/pubmed/28231289 http://dx.doi.org/10.1371/journal.pone.0171784 |
work_keys_str_mv | AT springatedavida rehranrpackageformanipulatingandanalysingelectronichealthrecorddata AT parisirosa rehranrpackageformanipulatingandanalysingelectronichealthrecorddata AT olierivan rehranrpackageformanipulatingandanalysingelectronichealthrecorddata AT reevesdavid rehranrpackageformanipulatingandanalysingelectronichealthrecorddata AT kontopantelisevangelos rehranrpackageformanipulatingandanalysingelectronichealthrecorddata |