Cargando…

LAD: Layer-Wise Adaptive Distillation for BERT Model Compression

Recent advances with large-scale pre-trained language models (e.g., BERT) have brought significant potential to natural language processing. However, the large model size hinders their use in IoT and edge devices. Several studies have utilized task-specific knowledge distillation to compress the pre...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lin, Ying-Jia, Chen, Kuan-Yu, Kao, Hung-Yu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9921705/ https://www.ncbi.nlm.nih.gov/pubmed/36772523 http://dx.doi.org/10.3390/s23031483

Internet

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9921705/
https://www.ncbi.nlm.nih.gov/pubmed/36772523
http://dx.doi.org/10.3390/s23031483

LAD: Layer-Wise Adaptive Distillation for BERT Model Compression

Internet

Ejemplares similares