Cargando…

SensiMix: Sensitivity-Aware 8-bit index & 1-bit value mixed precision quantization for BERT compression

Given a pre-trained BERT, how can we compress it to a fast and lightweight one while maintaining its accuracy? Pre-training language model, such as BERT, is effective for improving the performance of natural language processing (NLP) tasks. However, heavy models like BERT have problems of large memo...

Descripción completa

Detalles Bibliográficos
Autores principales: Piao, Tairen, Cho, Ikhyun, Kang, U.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9015158/
https://www.ncbi.nlm.nih.gov/pubmed/35436295
http://dx.doi.org/10.1371/journal.pone.0265621