Cargando…

Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework

Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data in...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Zhenlong, Yang, Chaowei, Jin, Baoxuan, Yu, Manzhu, Liu, Kai, Sun, Min, Zhan, Matthew
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351198/
https://www.ncbi.nlm.nih.gov/pubmed/25742012
http://dx.doi.org/10.1371/journal.pone.0116781
_version_ 1782360300860735488
author Li, Zhenlong
Yang, Chaowei
Jin, Baoxuan
Yu, Manzhu
Liu, Kai
Sun, Min
Zhan, Matthew
author_facet Li, Zhenlong
Yang, Chaowei
Jin, Baoxuan
Yu, Manzhu
Liu, Kai
Sun, Min
Zhan, Matthew
author_sort Li, Zhenlong
collection PubMed
description Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.
format Online
Article
Text
id pubmed-4351198
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-43511982015-03-17 Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework Li, Zhenlong Yang, Chaowei Jin, Baoxuan Yu, Manzhu Liu, Kai Sun, Min Zhan, Matthew PLoS One Research Article Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists. Public Library of Science 2015-03-05 /pmc/articles/PMC4351198/ /pubmed/25742012 http://dx.doi.org/10.1371/journal.pone.0116781 Text en © 2015 Li et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Li, Zhenlong
Yang, Chaowei
Jin, Baoxuan
Yu, Manzhu
Liu, Kai
Sun, Min
Zhan, Matthew
Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework
title Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework
title_full Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework
title_fullStr Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework
title_full_unstemmed Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework
title_short Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework
title_sort enabling big geoscience data analytics with a cloud-based, mapreduce-enabled and service-oriented workflow framework
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351198/
https://www.ncbi.nlm.nih.gov/pubmed/25742012
http://dx.doi.org/10.1371/journal.pone.0116781
work_keys_str_mv AT lizhenlong enablingbiggeosciencedataanalyticswithacloudbasedmapreduceenabledandserviceorientedworkflowframework
AT yangchaowei enablingbiggeosciencedataanalyticswithacloudbasedmapreduceenabledandserviceorientedworkflowframework
AT jinbaoxuan enablingbiggeosciencedataanalyticswithacloudbasedmapreduceenabledandserviceorientedworkflowframework
AT yumanzhu enablingbiggeosciencedataanalyticswithacloudbasedmapreduceenabledandserviceorientedworkflowframework
AT liukai enablingbiggeosciencedataanalyticswithacloudbasedmapreduceenabledandserviceorientedworkflowframework
AT sunmin enablingbiggeosciencedataanalyticswithacloudbasedmapreduceenabledandserviceorientedworkflowframework
AT zhanmatthew enablingbiggeosciencedataanalyticswithacloudbasedmapreduceenabledandserviceorientedworkflowframework