Cargando…

Framing Apache Spark in life sciences

Advances in high-throughput and digital technologies have required the adoption of big data for handling complex tasks in life sciences. However, the drift to big data led researchers to face technical and infrastructural challenges for storing, sharing, and analysing them. In fact, this kind of tas...

Descripción completa

Detalles Bibliográficos
Autores principales: Manconi, Andrea, Gnocchi, Matteo, Milanesi, Luciano, Marullo, Osvaldo, Armano, Giuliano
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9958288/
https://www.ncbi.nlm.nih.gov/pubmed/36852030
http://dx.doi.org/10.1016/j.heliyon.2023.e13368
_version_ 1784894989352828928
author Manconi, Andrea
Gnocchi, Matteo
Milanesi, Luciano
Marullo, Osvaldo
Armano, Giuliano
author_facet Manconi, Andrea
Gnocchi, Matteo
Milanesi, Luciano
Marullo, Osvaldo
Armano, Giuliano
author_sort Manconi, Andrea
collection PubMed
description Advances in high-throughput and digital technologies have required the adoption of big data for handling complex tasks in life sciences. However, the drift to big data led researchers to face technical and infrastructural challenges for storing, sharing, and analysing them. In fact, this kind of tasks requires distributed computing systems and algorithms able to ensure efficient processing. Cutting edge distributed programming frameworks allow to implement flexible algorithms able to adapt the computation to the data over on-premise HPC clusters or cloud architectures. In this context, Apache Spark is a very powerful HPC engine for large-scale data processing on clusters. Also thanks to specialised libraries for working with structured and relational data, it allows to support machine learning, graph-based computation, and stream processing. This review article is aimed at helping life sciences researchers to ascertain the features of Apache Spark and to assess whether it can be successfully used in their research activities.
format Online
Article
Text
id pubmed-9958288
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-99582882023-02-26 Framing Apache Spark in life sciences Manconi, Andrea Gnocchi, Matteo Milanesi, Luciano Marullo, Osvaldo Armano, Giuliano Heliyon Review Article Advances in high-throughput and digital technologies have required the adoption of big data for handling complex tasks in life sciences. However, the drift to big data led researchers to face technical and infrastructural challenges for storing, sharing, and analysing them. In fact, this kind of tasks requires distributed computing systems and algorithms able to ensure efficient processing. Cutting edge distributed programming frameworks allow to implement flexible algorithms able to adapt the computation to the data over on-premise HPC clusters or cloud architectures. In this context, Apache Spark is a very powerful HPC engine for large-scale data processing on clusters. Also thanks to specialised libraries for working with structured and relational data, it allows to support machine learning, graph-based computation, and stream processing. This review article is aimed at helping life sciences researchers to ascertain the features of Apache Spark and to assess whether it can be successfully used in their research activities. Elsevier 2023-02-09 /pmc/articles/PMC9958288/ /pubmed/36852030 http://dx.doi.org/10.1016/j.heliyon.2023.e13368 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Review Article
Manconi, Andrea
Gnocchi, Matteo
Milanesi, Luciano
Marullo, Osvaldo
Armano, Giuliano
Framing Apache Spark in life sciences
title Framing Apache Spark in life sciences
title_full Framing Apache Spark in life sciences
title_fullStr Framing Apache Spark in life sciences
title_full_unstemmed Framing Apache Spark in life sciences
title_short Framing Apache Spark in life sciences
title_sort framing apache spark in life sciences
topic Review Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9958288/
https://www.ncbi.nlm.nih.gov/pubmed/36852030
http://dx.doi.org/10.1016/j.heliyon.2023.e13368
work_keys_str_mv AT manconiandrea framingapachesparkinlifesciences
AT gnocchimatteo framingapachesparkinlifesciences
AT milanesiluciano framingapachesparkinlifesciences
AT marulloosvaldo framingapachesparkinlifesciences
AT armanogiuliano framingapachesparkinlifesciences