Cargando…
A FAIR and AI-ready Higgs boson decay dataset
To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8844008/ https://www.ncbi.nlm.nih.gov/pubmed/35165298 http://dx.doi.org/10.1038/s41597-021-01109-0 |
_version_ | 1784651390800363520 |
---|---|
author | Chen, Yifan Huerta, E. A. Duarte, Javier Harris, Philip Katz, Daniel S. Neubauer, Mark S. Diaz, Daniel Mokhtar, Farouk Kansal, Raghav Park, Sang Eon Kindratenko, Volodymyr V. Zhao, Zhizhen Rusack, Roger |
author_facet | Chen, Yifan Huerta, E. A. Duarte, Javier Harris, Philip Katz, Daniel S. Neubauer, Mark S. Diaz, Daniel Mokhtar, Farouk Kansal, Raghav Park, Sang Eon Kindratenko, Volodymyr V. Zhao, Zhizhen Rusack, Roger |
author_sort | Chen, Yifan |
collection | PubMed |
description | To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We use additional available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to visualize and explore this dataset. This study marks the first in a planned series of articles that will guide scientists in the creation of FAIR AI models and datasets in high energy particle physics. |
format | Online Article Text |
id | pubmed-8844008 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-88440082022-03-02 A FAIR and AI-ready Higgs boson decay dataset Chen, Yifan Huerta, E. A. Duarte, Javier Harris, Philip Katz, Daniel S. Neubauer, Mark S. Diaz, Daniel Mokhtar, Farouk Kansal, Raghav Park, Sang Eon Kindratenko, Volodymyr V. Zhao, Zhizhen Rusack, Roger Sci Data Article To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We use additional available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to visualize and explore this dataset. This study marks the first in a planned series of articles that will guide scientists in the creation of FAIR AI models and datasets in high energy particle physics. Nature Publishing Group UK 2022-02-14 /pmc/articles/PMC8844008/ /pubmed/35165298 http://dx.doi.org/10.1038/s41597-021-01109-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Chen, Yifan Huerta, E. A. Duarte, Javier Harris, Philip Katz, Daniel S. Neubauer, Mark S. Diaz, Daniel Mokhtar, Farouk Kansal, Raghav Park, Sang Eon Kindratenko, Volodymyr V. Zhao, Zhizhen Rusack, Roger A FAIR and AI-ready Higgs boson decay dataset |
title | A FAIR and AI-ready Higgs boson decay dataset |
title_full | A FAIR and AI-ready Higgs boson decay dataset |
title_fullStr | A FAIR and AI-ready Higgs boson decay dataset |
title_full_unstemmed | A FAIR and AI-ready Higgs boson decay dataset |
title_short | A FAIR and AI-ready Higgs boson decay dataset |
title_sort | fair and ai-ready higgs boson decay dataset |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8844008/ https://www.ncbi.nlm.nih.gov/pubmed/35165298 http://dx.doi.org/10.1038/s41597-021-01109-0 |
work_keys_str_mv | AT chenyifan afairandaireadyhiggsbosondecaydataset AT huertaea afairandaireadyhiggsbosondecaydataset AT duartejavier afairandaireadyhiggsbosondecaydataset AT harrisphilip afairandaireadyhiggsbosondecaydataset AT katzdaniels afairandaireadyhiggsbosondecaydataset AT neubauermarks afairandaireadyhiggsbosondecaydataset AT diazdaniel afairandaireadyhiggsbosondecaydataset AT mokhtarfarouk afairandaireadyhiggsbosondecaydataset AT kansalraghav afairandaireadyhiggsbosondecaydataset AT parksangeon afairandaireadyhiggsbosondecaydataset AT kindratenkovolodymyrv afairandaireadyhiggsbosondecaydataset AT zhaozhizhen afairandaireadyhiggsbosondecaydataset AT rusackroger afairandaireadyhiggsbosondecaydataset AT chenyifan fairandaireadyhiggsbosondecaydataset AT huertaea fairandaireadyhiggsbosondecaydataset AT duartejavier fairandaireadyhiggsbosondecaydataset AT harrisphilip fairandaireadyhiggsbosondecaydataset AT katzdaniels fairandaireadyhiggsbosondecaydataset AT neubauermarks fairandaireadyhiggsbosondecaydataset AT diazdaniel fairandaireadyhiggsbosondecaydataset AT mokhtarfarouk fairandaireadyhiggsbosondecaydataset AT kansalraghav fairandaireadyhiggsbosondecaydataset AT parksangeon fairandaireadyhiggsbosondecaydataset AT kindratenkovolodymyrv fairandaireadyhiggsbosondecaydataset AT zhaozhizhen fairandaireadyhiggsbosondecaydataset AT rusackroger fairandaireadyhiggsbosondecaydataset |