Cargando…

Pea-KD: Parameter-efficient and accurate Knowledge Distillation on BERT

Knowledge Distillation (KD) is one of the widely known methods for model compression. In essence, KD trains a smaller student model based on a larger teacher model and tries to retain the teacher model’s level of performance as much as possible. However, existing KD methods suffer from the following...

Descripción completa

Detalles Bibliográficos
Autores principales: Cho, Ikhyun, Kang, U
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8856529/
https://www.ncbi.nlm.nih.gov/pubmed/35180258
http://dx.doi.org/10.1371/journal.pone.0263592