Cargando…
LAD: Layer-Wise Adaptive Distillation for BERT Model Compression
Recent advances with large-scale pre-trained language models (e.g., BERT) have brought significant potential to natural language processing. However, the large model size hinders their use in IoT and edge devices. Several studies have utilized task-specific knowledge distillation to compress the pre...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9921705/ https://www.ncbi.nlm.nih.gov/pubmed/36772523 http://dx.doi.org/10.3390/s23031483 |