Cargando…
PET: Parameter-efficient Knowledge Distillation on Transformer
Given a large Transformer model, how can we obtain a small and computationally efficient model which maintains the performance of the original model? Transformer has shown significant performance improvements for many NLP tasks in recent years. However, their large size, expensive computational cost...
Autores principales: | Jeon, Hyojin, Park, Seungcheol, Kim, Jin-Gee, Kang, U. |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10325108/ https://www.ncbi.nlm.nih.gov/pubmed/37410716 http://dx.doi.org/10.1371/journal.pone.0288060 |
Ejemplares similares
-
Pea-KD: Parameter-efficient and accurate Knowledge Distillation on BERT
por: Cho, Ikhyun, et al.
Publicado: (2022) -
Self-evolving vision transformer for chest X-ray diagnosis through knowledge distillation
por: Park, Sangjoon, et al.
Publicado: (2022) -
Momentum contrast transformer for COVID-19 diagnosis with knowledge distillation
por: Dong, Aimei, et al.
Publicado: (2023) -
Compressing deep graph convolution network with multi-staged knowledge distillation
por: Kim, Junghun, et al.
Publicado: (2021) -
Communication-efficient federated learning via knowledge distillation
por: Wu, Chuhan, et al.
Publicado: (2022)