Cargando…

Binning high-dimensional classifier output for HEP analyses through a clustering algorithm

The usage of Deep Neural Networks (DNNs) as multi-classifiers is widespread in modern HEP analyses. In standard categorisation methods, the high-dimensional output of the DNN is often reduced to a one-dimensional distribution by exclusively passing the information about the highest class score to th...

Descripción completa

Detalles Bibliográficos
Autor principal: CMS Collaboration
Lenguaje:eng
Publicado: 2023
Materias:
Acceso en línea:http://cds.cern.ch/record/2872249
_version_ 1780978593652801536
author CMS Collaboration
author_facet CMS Collaboration
author_sort CMS Collaboration
collection CERN
description The usage of Deep Neural Networks (DNNs) as multi-classifiers is widespread in modern HEP analyses. In standard categorisation methods, the high-dimensional output of the DNN is often reduced to a one-dimensional distribution by exclusively passing the information about the highest class score to the statistical inference method. Correlations to other classes are hereby omitted. Moreover, in common statistical inference tools, the classification values need to be binned, which relies on the researcher's expertise and is often non-trivial. To overcome the challenge of binning multiple dimensions and preserving the correlations of the event-related classification information, we perform K-means clustering on the high-dimensional DNN output to create bins without marginalising any axes. We evaluate our method in the context of a simulated cross section measurement at the CMS experiment, showing an increased expected sensitivity over the standard binning approach.
id cern-2872249
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2023
record_format invenio
spelling cern-28722492023-09-25T18:53:32Zhttp://cds.cern.ch/record/2872249engCMS CollaborationBinning high-dimensional classifier output for HEP analyses through a clustering algorithmDetectors and Experimental TechniquesThe usage of Deep Neural Networks (DNNs) as multi-classifiers is widespread in modern HEP analyses. In standard categorisation methods, the high-dimensional output of the DNN is often reduced to a one-dimensional distribution by exclusively passing the information about the highest class score to the statistical inference method. Correlations to other classes are hereby omitted. Moreover, in common statistical inference tools, the classification values need to be binned, which relies on the researcher's expertise and is often non-trivial. To overcome the challenge of binning multiple dimensions and preserving the correlations of the event-related classification information, we perform K-means clustering on the high-dimensional DNN output to create bins without marginalising any axes. We evaluate our method in the context of a simulated cross section measurement at the CMS experiment, showing an increased expected sensitivity over the standard binning approach.CMS-DP-2023-074CERN-CMS-DP-2023-074oai:cds.cern.ch:28722492023-05-06
spellingShingle Detectors and Experimental Techniques
CMS Collaboration
Binning high-dimensional classifier output for HEP analyses through a clustering algorithm
title Binning high-dimensional classifier output for HEP analyses through a clustering algorithm
title_full Binning high-dimensional classifier output for HEP analyses through a clustering algorithm
title_fullStr Binning high-dimensional classifier output for HEP analyses through a clustering algorithm
title_full_unstemmed Binning high-dimensional classifier output for HEP analyses through a clustering algorithm
title_short Binning high-dimensional classifier output for HEP analyses through a clustering algorithm
title_sort binning high-dimensional classifier output for hep analyses through a clustering algorithm
topic Detectors and Experimental Techniques
url http://cds.cern.ch/record/2872249
work_keys_str_mv AT cmscollaboration binninghighdimensionalclassifieroutputforhepanalysesthroughaclusteringalgorithm