Cargando…

Design of a data processing method for the farmland environmental monitoring based on improved Spark components

With the popularization of big data technology, agricultural data processing systems have become more intelligent. In this study, a data processing method for farmland environmental monitoring based on improved Spark components is designed. It introduces the FAST-Join (Join critical filtering sampli...

Descripción completa

Detalles Bibliográficos
Autores principales: Tang, Ruipeng, Aridas, Narendra Kumar, Talip, Mohamad Sofian Abu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10694358/
http://dx.doi.org/10.3389/fdata.2023.1282352
_version_ 1785153358832599040
author Tang, Ruipeng
Aridas, Narendra Kumar
Talip, Mohamad Sofian Abu
author_facet Tang, Ruipeng
Aridas, Narendra Kumar
Talip, Mohamad Sofian Abu
author_sort Tang, Ruipeng
collection PubMed
description With the popularization of big data technology, agricultural data processing systems have become more intelligent. In this study, a data processing method for farmland environmental monitoring based on improved Spark components is designed. It introduces the FAST-Join (Join critical filtering sampling partition optimization) algorithm in the Spark component for equivalence association query optimization to improve the operating efficiency of the Spark component and cluster. The experimental results show that the amount of data written and read in Shuffle by Spark optimized by the FAST-join algorithm only accounts for 0.958 and 1.384% of the original data volume on average, and the calculation speed is 202.11% faster than the original. The average data processing time and occupied memory size of the Spark cluster are reduced by 128.22 and 76.75% compared with the originals. It also compared the cluster performance of the FAST-join and Equi-join algorithms. The Spark cluster optimized by the FAST-join algorithm reduced the processing time and occupied memory size by an average of 68.74 and 37.80% compared with the Equi-join algorithm, which shows that the FAST-join algorithm can effectively improve the efficiency of inter-data table querying and cluster computing.
format Online
Article
Text
id pubmed-10694358
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-106943582023-12-05 Design of a data processing method for the farmland environmental monitoring based on improved Spark components Tang, Ruipeng Aridas, Narendra Kumar Talip, Mohamad Sofian Abu Front Big Data Big Data With the popularization of big data technology, agricultural data processing systems have become more intelligent. In this study, a data processing method for farmland environmental monitoring based on improved Spark components is designed. It introduces the FAST-Join (Join critical filtering sampling partition optimization) algorithm in the Spark component for equivalence association query optimization to improve the operating efficiency of the Spark component and cluster. The experimental results show that the amount of data written and read in Shuffle by Spark optimized by the FAST-join algorithm only accounts for 0.958 and 1.384% of the original data volume on average, and the calculation speed is 202.11% faster than the original. The average data processing time and occupied memory size of the Spark cluster are reduced by 128.22 and 76.75% compared with the originals. It also compared the cluster performance of the FAST-join and Equi-join algorithms. The Spark cluster optimized by the FAST-join algorithm reduced the processing time and occupied memory size by an average of 68.74 and 37.80% compared with the Equi-join algorithm, which shows that the FAST-join algorithm can effectively improve the efficiency of inter-data table querying and cluster computing. Frontiers Media S.A. 2023-11-20 /pmc/articles/PMC10694358/ http://dx.doi.org/10.3389/fdata.2023.1282352 Text en Copyright © 2023 Tang, Aridas and Talip. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Big Data
Tang, Ruipeng
Aridas, Narendra Kumar
Talip, Mohamad Sofian Abu
Design of a data processing method for the farmland environmental monitoring based on improved Spark components
title Design of a data processing method for the farmland environmental monitoring based on improved Spark components
title_full Design of a data processing method for the farmland environmental monitoring based on improved Spark components
title_fullStr Design of a data processing method for the farmland environmental monitoring based on improved Spark components
title_full_unstemmed Design of a data processing method for the farmland environmental monitoring based on improved Spark components
title_short Design of a data processing method for the farmland environmental monitoring based on improved Spark components
title_sort design of a data processing method for the farmland environmental monitoring based on improved spark components
topic Big Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10694358/
http://dx.doi.org/10.3389/fdata.2023.1282352
work_keys_str_mv AT tangruipeng designofadataprocessingmethodforthefarmlandenvironmentalmonitoringbasedonimprovedsparkcomponents
AT aridasnarendrakumar designofadataprocessingmethodforthefarmlandenvironmentalmonitoringbasedonimprovedsparkcomponents
AT talipmohamadsofianabu designofadataprocessingmethodforthefarmlandenvironmentalmonitoringbasedonimprovedsparkcomponents