Cargando…
Towards Optimal Compression: Joint Pruning and Quantization
Model compression is instrumental in optimizing deep neural network inference on resource-constrained hardware. The prevailing methods for network compression, namely quantization and pruning, have been shown to enhance efficiency at the cost of performance. Determining the most effective quantizati...
Autores principales: | Zandonati, Ben, Bucagu, Glenn, Pol, Adrian Alan, Pierini, Maurizio, Sirkin, Olya, Kopetz, Tal |
---|---|
Lenguaje: | eng |
Publicado: |
2023
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2856527 |
Ejemplares similares
-
Lightweight Jet Reconstruction and Identification as an Object Detection Task
por: Pol, Adrian Alan, et al.
Publicado: (2022) -
Technical Report of Participation in Higgs Boson Machine Learning Challenge
por: Ahmad, S. Raza
Publicado: (2015) -
Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark
por: Borras, Hendrik, et al.
Publicado: (2022) -
Anomaly Detection With Conditional Variational Autoencoders
por: Pol, Adrian Alan, et al.
Publicado: (2020) -
QONNX: Representing Arbitrary-Precision Quantized Neural Networks
por: Pappalardo, Alessandro, et al.
Publicado: (2022)