Cargando…
Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data
This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approac...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4784924/ https://www.ncbi.nlm.nih.gov/pubmed/26958859 http://dx.doi.org/10.1371/journal.pone.0150069 |
_version_ | 1782420327669694464 |
---|---|
author | Freire, Sergio Miranda Teodoro, Douglas Wei-Kleiner, Fang Sundvall, Erik Karlsson, Daniel Lambrix, Patrick |
author_facet | Freire, Sergio Miranda Teodoro, Douglas Wei-Kleiner, Fang Sundvall, Erik Karlsson, Daniel Lambrix, Patrick |
author_sort | Freire, Sergio Miranda |
collection | PubMed |
description | This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest. |
format | Online Article Text |
id | pubmed-4784924 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-47849242016-03-23 Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data Freire, Sergio Miranda Teodoro, Douglas Wei-Kleiner, Fang Sundvall, Erik Karlsson, Daniel Lambrix, Patrick PLoS One Research Article This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest. Public Library of Science 2016-03-09 /pmc/articles/PMC4784924/ /pubmed/26958859 http://dx.doi.org/10.1371/journal.pone.0150069 Text en © 2016 Freire et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Freire, Sergio Miranda Teodoro, Douglas Wei-Kleiner, Fang Sundvall, Erik Karlsson, Daniel Lambrix, Patrick Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data |
title | Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data |
title_full | Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data |
title_fullStr | Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data |
title_full_unstemmed | Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data |
title_short | Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data |
title_sort | comparing the performance of nosql approaches for managing archetype-based electronic health record data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4784924/ https://www.ncbi.nlm.nih.gov/pubmed/26958859 http://dx.doi.org/10.1371/journal.pone.0150069 |
work_keys_str_mv | AT freiresergiomiranda comparingtheperformanceofnosqlapproachesformanagingarchetypebasedelectronichealthrecorddata AT teodorodouglas comparingtheperformanceofnosqlapproachesformanagingarchetypebasedelectronichealthrecorddata AT weikleinerfang comparingtheperformanceofnosqlapproachesformanagingarchetypebasedelectronichealthrecorddata AT sundvallerik comparingtheperformanceofnosqlapproachesformanagingarchetypebasedelectronichealthrecorddata AT karlssondaniel comparingtheperformanceofnosqlapproachesformanagingarchetypebasedelectronichealthrecorddata AT lambrixpatrick comparingtheperformanceofnosqlapproachesformanagingarchetypebasedelectronichealthrecorddata |