Cargando…
Ps and Qs: Quantization-Aware Pruning for Efficient Low Latency Neural Network Inference
Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular techniques for reducing computation in neural networks are pruni...
Autores principales: | Hawks, Benjamin, Duarte, Javier, Fraser, Nicholas J., Pappalardo, Alessandro, Tran, Nhan, Umuroglu, Yaman |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8299073/ https://www.ncbi.nlm.nih.gov/pubmed/34308339 http://dx.doi.org/10.3389/frai.2021.676564 |
Ejemplares similares
-
QONNX: Representing Arbitrary-Precision Quantized Neural Networks
por: Pappalardo, Alessandro, et al.
Publicado: (2022) -
A Synaptic Pruning-Based Spiking Neural Network for Hand-Written Digits Classification
por: Faghihi, Faramarz, et al.
Publicado: (2022) -
Random pruning: channel sparsity by expectation scaling factor
por: Sun, Chuanmeng, et al.
Publicado: (2023) -
A lightweight intrusion detection method for IoT based on deep learning and dynamic quantization
por: Wang, Zhendong, et al.
Publicado: (2023) -
Supply forecasting and profiling of urban supermarket chains based on tensor quantization exponential regression for social governance
por: Li, Dazhou, et al.
Publicado: (2022)