Cargando…

Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services

Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges of data aggregation, maintenance, integration, translation, analysis, and security/privacy. The study objective to establish an interactive BDA platform with simulated patient data using open-source...

Descripción completa

Detalles Bibliográficos
Autores principales: Chrimes, Dillon, Zamani, Hamid
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5742497/
https://www.ncbi.nlm.nih.gov/pubmed/29375652
http://dx.doi.org/10.1155/2017/6120820
_version_ 1783288388841897984
author Chrimes, Dillon
Zamani, Hamid
author_facet Chrimes, Dillon
Zamani, Hamid
author_sort Chrimes, Dillon
collection PubMed
description Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges of data aggregation, maintenance, integration, translation, analysis, and security/privacy. The study objective to establish an interactive BDA platform with simulated patient data using open-source software technologies was achieved by construction of a platform framework with Hadoop Distributed File System (HDFS) using HBase (key-value NoSQL database). Distributed data structures were generated from benchmarked hospital-specific metadata of nine billion patient records. At optimized iteration, HDFS ingestion of HFiles to HBase store files revealed sustained availability over hundreds of iterations; however, to complete MapReduce to HBase required a week (for 10 TB) and a month for three billion (30 TB) indexed patient records, respectively. Found inconsistencies of MapReduce limited the capacity to generate and replicate data efficiently. Apache Spark and Drill showed high performance with high usability for technical support but poor usability for clinical services. Hospital system based on patient-centric data was challenging in using HBase, whereby not all data profiles were fully integrated with the complex patient-to-hospital relationships. However, we recommend using HBase to achieve secured patient data while querying entire hospital volumes in a simplified clinical event model across clinical services.
format Online
Article
Text
id pubmed-5742497
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-57424972018-01-28 Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services Chrimes, Dillon Zamani, Hamid Comput Math Methods Med Research Article Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges of data aggregation, maintenance, integration, translation, analysis, and security/privacy. The study objective to establish an interactive BDA platform with simulated patient data using open-source software technologies was achieved by construction of a platform framework with Hadoop Distributed File System (HDFS) using HBase (key-value NoSQL database). Distributed data structures were generated from benchmarked hospital-specific metadata of nine billion patient records. At optimized iteration, HDFS ingestion of HFiles to HBase store files revealed sustained availability over hundreds of iterations; however, to complete MapReduce to HBase required a week (for 10 TB) and a month for three billion (30 TB) indexed patient records, respectively. Found inconsistencies of MapReduce limited the capacity to generate and replicate data efficiently. Apache Spark and Drill showed high performance with high usability for technical support but poor usability for clinical services. Hospital system based on patient-centric data was challenging in using HBase, whereby not all data profiles were fully integrated with the complex patient-to-hospital relationships. However, we recommend using HBase to achieve secured patient data while querying entire hospital volumes in a simplified clinical event model across clinical services. Hindawi 2017 2017-12-11 /pmc/articles/PMC5742497/ /pubmed/29375652 http://dx.doi.org/10.1155/2017/6120820 Text en Copyright © 2017 Dillon Chrimes and Hamid Zamani. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Chrimes, Dillon
Zamani, Hamid
Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services
title Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services
title_full Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services
title_fullStr Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services
title_full_unstemmed Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services
title_short Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services
title_sort using distributed data over hbase in big data analytics platform for clinical services
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5742497/
https://www.ncbi.nlm.nih.gov/pubmed/29375652
http://dx.doi.org/10.1155/2017/6120820
work_keys_str_mv AT chrimesdillon usingdistributeddataoverhbaseinbigdataanalyticsplatformforclinicalservices
AT zamanihamid usingdistributeddataoverhbaseinbigdataanalyticsplatformforclinicalservices