Cargando…

Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends

The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called “big data” challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and commun...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mohammed, Emad A, Far, Behrouz H, Naugler, Christopher
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4224309/ https://www.ncbi.nlm.nih.gov/pubmed/25383096 http://dx.doi.org/10.1186/1756-0381-7-22

_version_	1782343325234233344
author	Mohammed, Emad A Far, Behrouz H Naugler, Christopher
author_facet	Mohammed, Emad A Far, Behrouz H Naugler, Christopher
author_sort	Mohammed, Emad A
collection	PubMed
description	The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called “big data” challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. The MapReduce programming framework uses two tasks common in functional programming: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields.
format	Online Article Text
id	pubmed-4224309
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42243092014-11-08 Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends Mohammed, Emad A Far, Behrouz H Naugler, Christopher BioData Min Review The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called “big data” challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. The MapReduce programming framework uses two tasks common in functional programming: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields. BioMed Central 2014-10-29 /pmc/articles/PMC4224309/ /pubmed/25383096 http://dx.doi.org/10.1186/1756-0381-7-22 Text en Copyright © 2014 Mohammed et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Review Mohammed, Emad A Far, Behrouz H Naugler, Christopher Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends
title	Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends
title_full	Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends
title_fullStr	Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends
title_full_unstemmed	Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends
title_short	Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends
title_sort	applications of the mapreduce programming framework to clinical big data analysis: current landscape and future trends
topic	Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4224309/ https://www.ncbi.nlm.nih.gov/pubmed/25383096 http://dx.doi.org/10.1186/1756-0381-7-22
work_keys_str_mv	AT mohammedemada applicationsofthemapreduceprogrammingframeworktoclinicalbigdataanalysiscurrentlandscapeandfuturetrends AT farbehrouzh applicationsofthemapreduceprogrammingframeworktoclinicalbigdataanalysiscurrentlandscapeandfuturetrends AT nauglerchristopher applicationsofthemapreduceprogrammingframeworktoclinicalbigdataanalysiscurrentlandscapeandfuturetrends

Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends

Ejemplares similares