Cargando…

Pea-KD: Parameter-efficient and accurate Knowledge Distillation on BERT

Knowledge Distillation (KD) is one of the widely known methods for model compression. In essence, KD trains a smaller student model based on a larger teacher model and tries to retain the teacher model’s level of performance as much as possible. However, existing KD methods suffer from the following...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cho, Ikhyun, Kang, U
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8856529/ https://www.ncbi.nlm.nih.gov/pubmed/35180258 http://dx.doi.org/10.1371/journal.pone.0263592

Internet

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8856529/
https://www.ncbi.nlm.nih.gov/pubmed/35180258
http://dx.doi.org/10.1371/journal.pone.0263592

Pea-KD: Parameter-efficient and accurate Knowledge Distillation on BERT

Internet

Ejemplares similares