Cargando…

LAD: Layer-Wise Adaptive Distillation for BERT Model Compression

Recent advances with large-scale pre-trained language models (e.g., BERT) have brought significant potential to natural language processing. However, the large model size hinders their use in IoT and edge devices. Several studies have utilized task-specific knowledge distillation to compress the pre...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Ying-Jia, Chen, Kuan-Yu, Kao, Hung-Yu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9921705/
https://www.ncbi.nlm.nih.gov/pubmed/36772523
http://dx.doi.org/10.3390/s23031483