Cargando…

Distributed Fast Self-Organized Maps for Massive Spectrophotometric Data Analysis †

Analyzing huge amounts of data becomes essential in the era of Big Data, where databases are populated with hundreds of Gigabytes that must be processed to extract knowledge. Hence, classical algorithms must be adapted towards distributed computing methodologies that leverage the underlying computat...

Descripción completa

Detalles Bibliográficos
Autores principales: Dafonte, Carlos, Garabato, Daniel, Álvarez, Marco A., Manteiga, Minia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5982635/
https://www.ncbi.nlm.nih.gov/pubmed/29751580
http://dx.doi.org/10.3390/s18051419
_version_ 1783328278799450112
author Dafonte, Carlos
Garabato, Daniel
Álvarez, Marco A.
Manteiga, Minia
author_facet Dafonte, Carlos
Garabato, Daniel
Álvarez, Marco A.
Manteiga, Minia
author_sort Dafonte, Carlos
collection PubMed
description Analyzing huge amounts of data becomes essential in the era of Big Data, where databases are populated with hundreds of Gigabytes that must be processed to extract knowledge. Hence, classical algorithms must be adapted towards distributed computing methodologies that leverage the underlying computational power of these platforms. Here, a parallel, scalable, and optimized design for self-organized maps (SOM) is proposed in order to analyze massive data gathered by the spectrophotometric sensor of the European Space Agency (ESA) Gaia spacecraft, although it could be extrapolated to other domains. The performance comparison between the sequential implementation and the distributed ones based on Apache Hadoop and Apache Spark is an important part of the work, as well as the detailed analysis of the proposed optimizations. Finally, a domain-specific visualization tool to explore astronomical SOMs is presented.
format Online
Article
Text
id pubmed-5982635
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-59826352018-06-05 Distributed Fast Self-Organized Maps for Massive Spectrophotometric Data Analysis † Dafonte, Carlos Garabato, Daniel Álvarez, Marco A. Manteiga, Minia Sensors (Basel) Article Analyzing huge amounts of data becomes essential in the era of Big Data, where databases are populated with hundreds of Gigabytes that must be processed to extract knowledge. Hence, classical algorithms must be adapted towards distributed computing methodologies that leverage the underlying computational power of these platforms. Here, a parallel, scalable, and optimized design for self-organized maps (SOM) is proposed in order to analyze massive data gathered by the spectrophotometric sensor of the European Space Agency (ESA) Gaia spacecraft, although it could be extrapolated to other domains. The performance comparison between the sequential implementation and the distributed ones based on Apache Hadoop and Apache Spark is an important part of the work, as well as the detailed analysis of the proposed optimizations. Finally, a domain-specific visualization tool to explore astronomical SOMs is presented. MDPI 2018-05-03 /pmc/articles/PMC5982635/ /pubmed/29751580 http://dx.doi.org/10.3390/s18051419 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Dafonte, Carlos
Garabato, Daniel
Álvarez, Marco A.
Manteiga, Minia
Distributed Fast Self-Organized Maps for Massive Spectrophotometric Data Analysis †
title Distributed Fast Self-Organized Maps for Massive Spectrophotometric Data Analysis †
title_full Distributed Fast Self-Organized Maps for Massive Spectrophotometric Data Analysis †
title_fullStr Distributed Fast Self-Organized Maps for Massive Spectrophotometric Data Analysis †
title_full_unstemmed Distributed Fast Self-Organized Maps for Massive Spectrophotometric Data Analysis †
title_short Distributed Fast Self-Organized Maps for Massive Spectrophotometric Data Analysis †
title_sort distributed fast self-organized maps for massive spectrophotometric data analysis †
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5982635/
https://www.ncbi.nlm.nih.gov/pubmed/29751580
http://dx.doi.org/10.3390/s18051419
work_keys_str_mv AT dafontecarlos distributedfastselforganizedmapsformassivespectrophotometricdataanalysis
AT garabatodaniel distributedfastselforganizedmapsformassivespectrophotometricdataanalysis
AT alvarezmarcoa distributedfastselforganizedmapsformassivespectrophotometricdataanalysis
AT manteigaminia distributedfastselforganizedmapsformassivespectrophotometricdataanalysis