Cargando…

Astronomy and big data: a data clustering approach to identifying uncertain galaxy morphology

With the onset of massive cosmological data collection through media such as the Sloan Digital Sky Survey (SDSS), galaxy classification has been accomplished for the most part with the help of citizen science communities like Galaxy Zoo. Seeking the wisdom of the crowd for such Big Data processing h...

Descripción completa

Detalles Bibliográficos
Autores principales: Edwards, Kieran Jay, Gaber, Mohamed Medhat
Lenguaje:eng
Publicado: Springer 2014
Materias:
Acceso en línea:https://dx.doi.org/10.1007/978-3-319-06599-1
http://cds.cern.ch/record/1702349
_version_ 1780936312885346304
author Edwards, Kieran Jay
Gaber, Mohamed Medhat
author_facet Edwards, Kieran Jay
Gaber, Mohamed Medhat
author_sort Edwards, Kieran Jay
collection CERN
description With the onset of massive cosmological data collection through media such as the Sloan Digital Sky Survey (SDSS), galaxy classification has been accomplished for the most part with the help of citizen science communities like Galaxy Zoo. Seeking the wisdom of the crowd for such Big Data processing has proved extremely beneficial. However, an analysis of one of the Galaxy Zoo morphological classification data sets has shown that a significant majority of all classified galaxies are labelled as “Uncertain”. This book reports on how to use data mining, more specifically clustering, to identify galaxies that the public has shown some degree of uncertainty for as to whether they belong to one morphology type or another. The book shows the importance of transitions between different data mining techniques in an insightful workflow. It demonstrates that Clustering enables to identify discriminating features in the analysed data sets, adopting a novel feature selection algorithms called Incremental Feature Selection (IFS). The book shows the use of state-of-the-art classification techniques, Random Forests and Support Vector Machines to validate the acquired results. It is concluded that a vast majority of these galaxies are, in fact, of spiral morphology with a small subset potentially consisting of stars, elliptical galaxies or galaxies of other morphological variants.
id cern-1702349
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2014
publisher Springer
record_format invenio
spelling cern-17023492021-04-21T21:01:37Zdoi:10.1007/978-3-319-06599-1http://cds.cern.ch/record/1702349engEdwards, Kieran JayGaber, Mohamed MedhatAstronomy and big data: a data clustering approach to identifying uncertain galaxy morphologyEngineeringWith the onset of massive cosmological data collection through media such as the Sloan Digital Sky Survey (SDSS), galaxy classification has been accomplished for the most part with the help of citizen science communities like Galaxy Zoo. Seeking the wisdom of the crowd for such Big Data processing has proved extremely beneficial. However, an analysis of one of the Galaxy Zoo morphological classification data sets has shown that a significant majority of all classified galaxies are labelled as “Uncertain”. This book reports on how to use data mining, more specifically clustering, to identify galaxies that the public has shown some degree of uncertainty for as to whether they belong to one morphology type or another. The book shows the importance of transitions between different data mining techniques in an insightful workflow. It demonstrates that Clustering enables to identify discriminating features in the analysed data sets, adopting a novel feature selection algorithms called Incremental Feature Selection (IFS). The book shows the use of state-of-the-art classification techniques, Random Forests and Support Vector Machines to validate the acquired results. It is concluded that a vast majority of these galaxies are, in fact, of spiral morphology with a small subset potentially consisting of stars, elliptical galaxies or galaxies of other morphological variants.Springeroai:cds.cern.ch:17023492014
spellingShingle Engineering
Edwards, Kieran Jay
Gaber, Mohamed Medhat
Astronomy and big data: a data clustering approach to identifying uncertain galaxy morphology
title Astronomy and big data: a data clustering approach to identifying uncertain galaxy morphology
title_full Astronomy and big data: a data clustering approach to identifying uncertain galaxy morphology
title_fullStr Astronomy and big data: a data clustering approach to identifying uncertain galaxy morphology
title_full_unstemmed Astronomy and big data: a data clustering approach to identifying uncertain galaxy morphology
title_short Astronomy and big data: a data clustering approach to identifying uncertain galaxy morphology
title_sort astronomy and big data: a data clustering approach to identifying uncertain galaxy morphology
topic Engineering
url https://dx.doi.org/10.1007/978-3-319-06599-1
http://cds.cern.ch/record/1702349
work_keys_str_mv AT edwardskieranjay astronomyandbigdataadataclusteringapproachtoidentifyinguncertaingalaxymorphology
AT gabermohamedmedhat astronomyandbigdataadataclusteringapproachtoidentifyinguncertaingalaxymorphology