Cargando…

A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials

Clinical trial data are typically collected through multiple systems developed by different vendors using different technologies and data standards. That data need to be integrated, standardized and transformed for a variety of monitoring and reporting purposes. The need to process large volumes of...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Eric, Scheff, Jeremy D, Shen, Shih C, Farnum, Michael A, Sefton, James, Lobanov, Victor S, Agrafiotis, Dimitris K
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6409386/
https://www.ncbi.nlm.nih.gov/pubmed/30854563
http://dx.doi.org/10.1093/database/baz032
_version_ 1783401958192709632
author Yang, Eric
Scheff, Jeremy D
Shen, Shih C
Farnum, Michael A
Sefton, James
Lobanov, Victor S
Agrafiotis, Dimitris K
author_facet Yang, Eric
Scheff, Jeremy D
Shen, Shih C
Farnum, Michael A
Sefton, James
Lobanov, Victor S
Agrafiotis, Dimitris K
author_sort Yang, Eric
collection PubMed
description Clinical trial data are typically collected through multiple systems developed by different vendors using different technologies and data standards. That data need to be integrated, standardized and transformed for a variety of monitoring and reporting purposes. The need to process large volumes of often inconsistent data in the presence of ever-changing requirements poses a significant technical challenge. As part of a comprehensive clinical data repository, we have developed a data warehouse that integrates patient data from any source, standardizes it and makes it accessible to study teams in a timely manner to support a wide range of analytic tasks for both in-flight and completed studies. Our solution combines Apache HBase, a NoSQL column store, Apache Phoenix, a massively parallel relational query engine and a user-friendly interface to facilitate efficient loading of large volumes of data under incomplete or ambiguous specifications, utilizing an extract–load–transform design pattern that defers data mapping until query time. This approach allows us to maintain a single copy of the data and transform it dynamically into any desirable format without requiring additional storage. Changes to the mapping specifications can be easily introduced and multiple representations of the data can be made available concurrently. Further, by versioning the data and the transformations separately, we can apply historical maps to current data or current maps to historical data, which simplifies the maintenance of data cuts and facilitates interim analyses for adaptive trials. The result is a highly scalable, secure and redundant solution that combines the flexibility of a NoSQL store with the robustness of a relational query engine to support a broad range of applications, including clinical data management, medical review, risk-based monitoring, safety signal detection, post hoc analysis of completed studies and many others.
format Online
Article
Text
id pubmed-6409386
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-64093862019-03-15 A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials Yang, Eric Scheff, Jeremy D Shen, Shih C Farnum, Michael A Sefton, James Lobanov, Victor S Agrafiotis, Dimitris K Database (Oxford) Original Article Clinical trial data are typically collected through multiple systems developed by different vendors using different technologies and data standards. That data need to be integrated, standardized and transformed for a variety of monitoring and reporting purposes. The need to process large volumes of often inconsistent data in the presence of ever-changing requirements poses a significant technical challenge. As part of a comprehensive clinical data repository, we have developed a data warehouse that integrates patient data from any source, standardizes it and makes it accessible to study teams in a timely manner to support a wide range of analytic tasks for both in-flight and completed studies. Our solution combines Apache HBase, a NoSQL column store, Apache Phoenix, a massively parallel relational query engine and a user-friendly interface to facilitate efficient loading of large volumes of data under incomplete or ambiguous specifications, utilizing an extract–load–transform design pattern that defers data mapping until query time. This approach allows us to maintain a single copy of the data and transform it dynamically into any desirable format without requiring additional storage. Changes to the mapping specifications can be easily introduced and multiple representations of the data can be made available concurrently. Further, by versioning the data and the transformations separately, we can apply historical maps to current data or current maps to historical data, which simplifies the maintenance of data cuts and facilitates interim analyses for adaptive trials. The result is a highly scalable, secure and redundant solution that combines the flexibility of a NoSQL store with the robustness of a relational query engine to support a broad range of applications, including clinical data management, medical review, risk-based monitoring, safety signal detection, post hoc analysis of completed studies and many others. Oxford University Press 2019-03-11 /pmc/articles/PMC6409386/ /pubmed/30854563 http://dx.doi.org/10.1093/database/baz032 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Yang, Eric
Scheff, Jeremy D
Shen, Shih C
Farnum, Michael A
Sefton, James
Lobanov, Victor S
Agrafiotis, Dimitris K
A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials
title A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials
title_full A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials
title_fullStr A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials
title_full_unstemmed A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials
title_short A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials
title_sort late-binding, distributed, nosql warehouse for integrating patient data from clinical trials
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6409386/
https://www.ncbi.nlm.nih.gov/pubmed/30854563
http://dx.doi.org/10.1093/database/baz032
work_keys_str_mv AT yangeric alatebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials
AT scheffjeremyd alatebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials
AT shenshihc alatebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials
AT farnummichaela alatebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials
AT seftonjames alatebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials
AT lobanovvictors alatebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials
AT agrafiotisdimitrisk alatebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials
AT yangeric latebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials
AT scheffjeremyd latebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials
AT shenshihc latebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials
AT farnummichaela latebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials
AT seftonjames latebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials
AT lobanovvictors latebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials
AT agrafiotisdimitrisk latebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials