Cargando…

Towards Optimal Compression: Joint Pruning and Quantization

Model compression is instrumental in optimizing deep neural network inference on resource-constrained hardware. The prevailing methods for network compression, namely quantization and pruning, have been shown to enhance efficiency at the cost of performance. Determining the most effective quantizati...

Descripción completa

Detalles Bibliográficos
Autores principales: Zandonati, Ben, Bucagu, Glenn, Pol, Adrian Alan, Pierini, Maurizio, Sirkin, Olya, Kopetz, Tal
Lenguaje:eng
Publicado: 2023
Materias:
Acceso en línea:http://cds.cern.ch/record/2856527