Cargando…

Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data

This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approac...

Descripción completa

Detalles Bibliográficos
Autores principales: Freire, Sergio Miranda, Teodoro, Douglas, Wei-Kleiner, Fang, Sundvall, Erik, Karlsson, Daniel, Lambrix, Patrick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4784924/
https://www.ncbi.nlm.nih.gov/pubmed/26958859
http://dx.doi.org/10.1371/journal.pone.0150069
_version_ 1782420327669694464
author Freire, Sergio Miranda
Teodoro, Douglas
Wei-Kleiner, Fang
Sundvall, Erik
Karlsson, Daniel
Lambrix, Patrick
author_facet Freire, Sergio Miranda
Teodoro, Douglas
Wei-Kleiner, Fang
Sundvall, Erik
Karlsson, Daniel
Lambrix, Patrick
author_sort Freire, Sergio Miranda
collection PubMed
description This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest.
format Online
Article
Text
id pubmed-4784924
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-47849242016-03-23 Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data Freire, Sergio Miranda Teodoro, Douglas Wei-Kleiner, Fang Sundvall, Erik Karlsson, Daniel Lambrix, Patrick PLoS One Research Article This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest. Public Library of Science 2016-03-09 /pmc/articles/PMC4784924/ /pubmed/26958859 http://dx.doi.org/10.1371/journal.pone.0150069 Text en © 2016 Freire et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Freire, Sergio Miranda
Teodoro, Douglas
Wei-Kleiner, Fang
Sundvall, Erik
Karlsson, Daniel
Lambrix, Patrick
Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data
title Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data
title_full Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data
title_fullStr Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data
title_full_unstemmed Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data
title_short Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data
title_sort comparing the performance of nosql approaches for managing archetype-based electronic health record data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4784924/
https://www.ncbi.nlm.nih.gov/pubmed/26958859
http://dx.doi.org/10.1371/journal.pone.0150069
work_keys_str_mv AT freiresergiomiranda comparingtheperformanceofnosqlapproachesformanagingarchetypebasedelectronichealthrecorddata
AT teodorodouglas comparingtheperformanceofnosqlapproachesformanagingarchetypebasedelectronichealthrecorddata
AT weikleinerfang comparingtheperformanceofnosqlapproachesformanagingarchetypebasedelectronichealthrecorddata
AT sundvallerik comparingtheperformanceofnosqlapproachesformanagingarchetypebasedelectronichealthrecorddata
AT karlssondaniel comparingtheperformanceofnosqlapproachesformanagingarchetypebasedelectronichealthrecorddata
AT lambrixpatrick comparingtheperformanceofnosqlapproachesformanagingarchetypebasedelectronichealthrecorddata