Cargando…
Compute, Storage and Throughput Trade-offs for High-Energy Physics Data Acquisition
Nowadays, the large majority of research insights are gained by using compute-aided analyses. Before the analysis, data needs to be acquired and prepared. Depending on the source of data, its acquisition can be a complex process. In that case, so-called Data Acquisition Systems (DAQs) are used. Real...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2022
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2824844 |
_version_ | 1780973733486264320 |
---|---|
author | Promberger, Laura |
author_facet | Promberger, Laura |
author_sort | Promberger, Laura |
collection | CERN |
description | Nowadays, the large majority of research insights are gained by using compute-aided analyses. Before the analysis, data needs to be acquired and prepared. Depending on the source of data, its acquisition can be a complex process. In that case, so-called Data Acquisition Systems (DAQs) are used. Real-time DAQs require unique challenges to be solved, either in latency or throughput. At the European Organization for Nuclear Research (CERN), where High Energy Physics (HEP) experiments collide particles, real-time DAQs are deployed to filter down the vast amount of data (for LHCb experiment: up to 4~TB/s). When working with large amounts of data, data compression allows improving limitations in capacity and transfer rates. This work is about the various compression techniques that exist and their evaluation for real-time HEP DAQs. The first part characterizes popular general-purpose compression algorithms and their performance using ARM aarch64, IBM ppc64le, and Intel x86\_64 CPU architectures. Their performance is found to be independent of the underlying CPU architecture, making each architecture a viable choice. The scaling and robustness are dependent on the number of simultaneous multithreading (SMT) available. High numbers of SMT scale better but are less robust in performance. When it comes to ``green'' computing, ARM outperforms IBM by a factor of 2.8 and Intel by a factor of 1.3. The second part designs a co-scheduling policy that improves the integration of compression devices. This policy allows for efficient and fair distribution of performance between (independent) host and device workloads. It only needs two metrics: power consumption and memory bandwidth, and does not require any code changes for the host workload. Solely with NUMA binding and either polling or interrupts for communication with the device, the performance increases for resource-unsaturated host workloads by a factor of 1.5 -- 4.0 for the device and by a factor of 1.8 -- 2.3 for the host. For resource-saturated host workloads, it increases by a factor of 1.8 -- 1.9 for the device but decreases by 0.1 -- 0.4 for the host. The third part evaluates two compression techniques utilizing domain-based knowledge: Huffman coding and lossy autoencoders. Huffman coding on the original data compresses \mbox{40~--~260\%} better than any tested general-purpose algorithms. Huffman coding on delta encoded data performs poorly for HEP data. Autoencoders are a popular machine learning technique. Two data representations, including One Hot Encoding, and many hyperparameters are tested. However, all configurations turn out to compress too lossy. They need more technological advances to improve the performance of neural networks with large layers. And the last part performs a cost-benefit analysis of the previously presented compression techniques. It is based on power savings and capital expenses. Applied to the real-time LHCb DAQ, it concludes that only compression accelerators are an economically viable choice. Huffman coding on absolute values achieves a higher compression ratio than any general-purpose solution but is too slow. More research would be needed to find a better fitting compression technique based on domain knowledge. While the context of this work is real-time DAQs in the HEP community with specific requirements and limitations, we believe the results of this work are generic enough to apply to the majority of environments and data characteristics. |
id | cern-2824844 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2022 |
record_format | invenio |
spelling | cern-28248442022-09-20T07:46:19Zhttp://cds.cern.ch/record/2824844engPromberger, LauraCompute, Storage and Throughput Trade-offs for High-Energy Physics Data AcquisitionDetectors and Experimental TechniquesNowadays, the large majority of research insights are gained by using compute-aided analyses. Before the analysis, data needs to be acquired and prepared. Depending on the source of data, its acquisition can be a complex process. In that case, so-called Data Acquisition Systems (DAQs) are used. Real-time DAQs require unique challenges to be solved, either in latency or throughput. At the European Organization for Nuclear Research (CERN), where High Energy Physics (HEP) experiments collide particles, real-time DAQs are deployed to filter down the vast amount of data (for LHCb experiment: up to 4~TB/s). When working with large amounts of data, data compression allows improving limitations in capacity and transfer rates. This work is about the various compression techniques that exist and their evaluation for real-time HEP DAQs. The first part characterizes popular general-purpose compression algorithms and their performance using ARM aarch64, IBM ppc64le, and Intel x86\_64 CPU architectures. Their performance is found to be independent of the underlying CPU architecture, making each architecture a viable choice. The scaling and robustness are dependent on the number of simultaneous multithreading (SMT) available. High numbers of SMT scale better but are less robust in performance. When it comes to ``green'' computing, ARM outperforms IBM by a factor of 2.8 and Intel by a factor of 1.3. The second part designs a co-scheduling policy that improves the integration of compression devices. This policy allows for efficient and fair distribution of performance between (independent) host and device workloads. It only needs two metrics: power consumption and memory bandwidth, and does not require any code changes for the host workload. Solely with NUMA binding and either polling or interrupts for communication with the device, the performance increases for resource-unsaturated host workloads by a factor of 1.5 -- 4.0 for the device and by a factor of 1.8 -- 2.3 for the host. For resource-saturated host workloads, it increases by a factor of 1.8 -- 1.9 for the device but decreases by 0.1 -- 0.4 for the host. The third part evaluates two compression techniques utilizing domain-based knowledge: Huffman coding and lossy autoencoders. Huffman coding on the original data compresses \mbox{40~--~260\%} better than any tested general-purpose algorithms. Huffman coding on delta encoded data performs poorly for HEP data. Autoencoders are a popular machine learning technique. Two data representations, including One Hot Encoding, and many hyperparameters are tested. However, all configurations turn out to compress too lossy. They need more technological advances to improve the performance of neural networks with large layers. And the last part performs a cost-benefit analysis of the previously presented compression techniques. It is based on power savings and capital expenses. Applied to the real-time LHCb DAQ, it concludes that only compression accelerators are an economically viable choice. Huffman coding on absolute values achieves a higher compression ratio than any general-purpose solution but is too slow. More research would be needed to find a better fitting compression technique based on domain knowledge. While the context of this work is real-time DAQs in the HEP community with specific requirements and limitations, we believe the results of this work are generic enough to apply to the majority of environments and data characteristics.CERN-THESIS-2022-108oai:cds.cern.ch:28248442022-08-18T09:34:43Z |
spellingShingle | Detectors and Experimental Techniques Promberger, Laura Compute, Storage and Throughput Trade-offs for High-Energy Physics Data Acquisition |
title | Compute, Storage and Throughput Trade-offs for High-Energy Physics Data Acquisition |
title_full | Compute, Storage and Throughput Trade-offs for High-Energy Physics Data Acquisition |
title_fullStr | Compute, Storage and Throughput Trade-offs for High-Energy Physics Data Acquisition |
title_full_unstemmed | Compute, Storage and Throughput Trade-offs for High-Energy Physics Data Acquisition |
title_short | Compute, Storage and Throughput Trade-offs for High-Energy Physics Data Acquisition |
title_sort | compute, storage and throughput trade-offs for high-energy physics data acquisition |
topic | Detectors and Experimental Techniques |
url | http://cds.cern.ch/record/2824844 |
work_keys_str_mv | AT prombergerlaura computestorageandthroughputtradeoffsforhighenergyphysicsdataacquisition |