Cargando…
A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials
Clinical trial data are typically collected through multiple systems developed by different vendors using different technologies and data standards. That data need to be integrated, standardized and transformed for a variety of monitoring and reporting purposes. The need to process large volumes of...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6409386/ https://www.ncbi.nlm.nih.gov/pubmed/30854563 http://dx.doi.org/10.1093/database/baz032 |
_version_ | 1783401958192709632 |
---|---|
author | Yang, Eric Scheff, Jeremy D Shen, Shih C Farnum, Michael A Sefton, James Lobanov, Victor S Agrafiotis, Dimitris K |
author_facet | Yang, Eric Scheff, Jeremy D Shen, Shih C Farnum, Michael A Sefton, James Lobanov, Victor S Agrafiotis, Dimitris K |
author_sort | Yang, Eric |
collection | PubMed |
description | Clinical trial data are typically collected through multiple systems developed by different vendors using different technologies and data standards. That data need to be integrated, standardized and transformed for a variety of monitoring and reporting purposes. The need to process large volumes of often inconsistent data in the presence of ever-changing requirements poses a significant technical challenge. As part of a comprehensive clinical data repository, we have developed a data warehouse that integrates patient data from any source, standardizes it and makes it accessible to study teams in a timely manner to support a wide range of analytic tasks for both in-flight and completed studies. Our solution combines Apache HBase, a NoSQL column store, Apache Phoenix, a massively parallel relational query engine and a user-friendly interface to facilitate efficient loading of large volumes of data under incomplete or ambiguous specifications, utilizing an extract–load–transform design pattern that defers data mapping until query time. This approach allows us to maintain a single copy of the data and transform it dynamically into any desirable format without requiring additional storage. Changes to the mapping specifications can be easily introduced and multiple representations of the data can be made available concurrently. Further, by versioning the data and the transformations separately, we can apply historical maps to current data or current maps to historical data, which simplifies the maintenance of data cuts and facilitates interim analyses for adaptive trials. The result is a highly scalable, secure and redundant solution that combines the flexibility of a NoSQL store with the robustness of a relational query engine to support a broad range of applications, including clinical data management, medical review, risk-based monitoring, safety signal detection, post hoc analysis of completed studies and many others. |
format | Online Article Text |
id | pubmed-6409386 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-64093862019-03-15 A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials Yang, Eric Scheff, Jeremy D Shen, Shih C Farnum, Michael A Sefton, James Lobanov, Victor S Agrafiotis, Dimitris K Database (Oxford) Original Article Clinical trial data are typically collected through multiple systems developed by different vendors using different technologies and data standards. That data need to be integrated, standardized and transformed for a variety of monitoring and reporting purposes. The need to process large volumes of often inconsistent data in the presence of ever-changing requirements poses a significant technical challenge. As part of a comprehensive clinical data repository, we have developed a data warehouse that integrates patient data from any source, standardizes it and makes it accessible to study teams in a timely manner to support a wide range of analytic tasks for both in-flight and completed studies. Our solution combines Apache HBase, a NoSQL column store, Apache Phoenix, a massively parallel relational query engine and a user-friendly interface to facilitate efficient loading of large volumes of data under incomplete or ambiguous specifications, utilizing an extract–load–transform design pattern that defers data mapping until query time. This approach allows us to maintain a single copy of the data and transform it dynamically into any desirable format without requiring additional storage. Changes to the mapping specifications can be easily introduced and multiple representations of the data can be made available concurrently. Further, by versioning the data and the transformations separately, we can apply historical maps to current data or current maps to historical data, which simplifies the maintenance of data cuts and facilitates interim analyses for adaptive trials. The result is a highly scalable, secure and redundant solution that combines the flexibility of a NoSQL store with the robustness of a relational query engine to support a broad range of applications, including clinical data management, medical review, risk-based monitoring, safety signal detection, post hoc analysis of completed studies and many others. Oxford University Press 2019-03-11 /pmc/articles/PMC6409386/ /pubmed/30854563 http://dx.doi.org/10.1093/database/baz032 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Yang, Eric Scheff, Jeremy D Shen, Shih C Farnum, Michael A Sefton, James Lobanov, Victor S Agrafiotis, Dimitris K A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials |
title | A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials |
title_full | A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials |
title_fullStr | A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials |
title_full_unstemmed | A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials |
title_short | A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials |
title_sort | late-binding, distributed, nosql warehouse for integrating patient data from clinical trials |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6409386/ https://www.ncbi.nlm.nih.gov/pubmed/30854563 http://dx.doi.org/10.1093/database/baz032 |
work_keys_str_mv | AT yangeric alatebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials AT scheffjeremyd alatebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials AT shenshihc alatebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials AT farnummichaela alatebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials AT seftonjames alatebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials AT lobanovvictors alatebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials AT agrafiotisdimitrisk alatebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials AT yangeric latebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials AT scheffjeremyd latebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials AT shenshihc latebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials AT farnummichaela latebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials AT seftonjames latebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials AT lobanovvictors latebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials AT agrafiotisdimitrisk latebindingdistributednosqlwarehouseforintegratingpatientdatafromclinicaltrials |