Cargando…

A FAIR and AI-ready Higgs boson decay dataset

To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Yifan, Huerta, E. A., Duarte, Javier, Harris, Philip, Katz, Daniel S., Neubauer, Mark S., Diaz, Daniel, Mokhtar, Farouk, Kansal, Raghav, Park, Sang Eon, Kindratenko, Volodymyr V., Zhao, Zhizhen, Rusack, Roger
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8844008/
https://www.ncbi.nlm.nih.gov/pubmed/35165298
http://dx.doi.org/10.1038/s41597-021-01109-0
_version_ 1784651390800363520
author Chen, Yifan
Huerta, E. A.
Duarte, Javier
Harris, Philip
Katz, Daniel S.
Neubauer, Mark S.
Diaz, Daniel
Mokhtar, Farouk
Kansal, Raghav
Park, Sang Eon
Kindratenko, Volodymyr V.
Zhao, Zhizhen
Rusack, Roger
author_facet Chen, Yifan
Huerta, E. A.
Duarte, Javier
Harris, Philip
Katz, Daniel S.
Neubauer, Mark S.
Diaz, Daniel
Mokhtar, Farouk
Kansal, Raghav
Park, Sang Eon
Kindratenko, Volodymyr V.
Zhao, Zhizhen
Rusack, Roger
author_sort Chen, Yifan
collection PubMed
description To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We use additional available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to visualize and explore this dataset. This study marks the first in a planned series of articles that will guide scientists in the creation of FAIR AI models and datasets in high energy particle physics.
format Online
Article
Text
id pubmed-8844008
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-88440082022-03-02 A FAIR and AI-ready Higgs boson decay dataset Chen, Yifan Huerta, E. A. Duarte, Javier Harris, Philip Katz, Daniel S. Neubauer, Mark S. Diaz, Daniel Mokhtar, Farouk Kansal, Raghav Park, Sang Eon Kindratenko, Volodymyr V. Zhao, Zhizhen Rusack, Roger Sci Data Article To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We use additional available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to visualize and explore this dataset. This study marks the first in a planned series of articles that will guide scientists in the creation of FAIR AI models and datasets in high energy particle physics. Nature Publishing Group UK 2022-02-14 /pmc/articles/PMC8844008/ /pubmed/35165298 http://dx.doi.org/10.1038/s41597-021-01109-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Chen, Yifan
Huerta, E. A.
Duarte, Javier
Harris, Philip
Katz, Daniel S.
Neubauer, Mark S.
Diaz, Daniel
Mokhtar, Farouk
Kansal, Raghav
Park, Sang Eon
Kindratenko, Volodymyr V.
Zhao, Zhizhen
Rusack, Roger
A FAIR and AI-ready Higgs boson decay dataset
title A FAIR and AI-ready Higgs boson decay dataset
title_full A FAIR and AI-ready Higgs boson decay dataset
title_fullStr A FAIR and AI-ready Higgs boson decay dataset
title_full_unstemmed A FAIR and AI-ready Higgs boson decay dataset
title_short A FAIR and AI-ready Higgs boson decay dataset
title_sort fair and ai-ready higgs boson decay dataset
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8844008/
https://www.ncbi.nlm.nih.gov/pubmed/35165298
http://dx.doi.org/10.1038/s41597-021-01109-0
work_keys_str_mv AT chenyifan afairandaireadyhiggsbosondecaydataset
AT huertaea afairandaireadyhiggsbosondecaydataset
AT duartejavier afairandaireadyhiggsbosondecaydataset
AT harrisphilip afairandaireadyhiggsbosondecaydataset
AT katzdaniels afairandaireadyhiggsbosondecaydataset
AT neubauermarks afairandaireadyhiggsbosondecaydataset
AT diazdaniel afairandaireadyhiggsbosondecaydataset
AT mokhtarfarouk afairandaireadyhiggsbosondecaydataset
AT kansalraghav afairandaireadyhiggsbosondecaydataset
AT parksangeon afairandaireadyhiggsbosondecaydataset
AT kindratenkovolodymyrv afairandaireadyhiggsbosondecaydataset
AT zhaozhizhen afairandaireadyhiggsbosondecaydataset
AT rusackroger afairandaireadyhiggsbosondecaydataset
AT chenyifan fairandaireadyhiggsbosondecaydataset
AT huertaea fairandaireadyhiggsbosondecaydataset
AT duartejavier fairandaireadyhiggsbosondecaydataset
AT harrisphilip fairandaireadyhiggsbosondecaydataset
AT katzdaniels fairandaireadyhiggsbosondecaydataset
AT neubauermarks fairandaireadyhiggsbosondecaydataset
AT diazdaniel fairandaireadyhiggsbosondecaydataset
AT mokhtarfarouk fairandaireadyhiggsbosondecaydataset
AT kansalraghav fairandaireadyhiggsbosondecaydataset
AT parksangeon fairandaireadyhiggsbosondecaydataset
AT kindratenkovolodymyrv fairandaireadyhiggsbosondecaydataset
AT zhaozhizhen fairandaireadyhiggsbosondecaydataset
AT rusackroger fairandaireadyhiggsbosondecaydataset