Cargando…

PET: Parameter-efficient Knowledge Distillation on Transformer

Given a large Transformer model, how can we obtain a small and computationally efficient model which maintains the performance of the original model? Transformer has shown significant performance improvements for many NLP tasks in recent years. However, their large size, expensive computational cost...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jeon, Hyojin, Park, Seungcheol, Kim, Jin-Gee, Kang, U.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10325108/ https://www.ncbi.nlm.nih.gov/pubmed/37410716 http://dx.doi.org/10.1371/journal.pone.0288060

_version_	1785069242459095040
author	Jeon, Hyojin Park, Seungcheol Kim, Jin-Gee Kang, U.
author_facet	Jeon, Hyojin Park, Seungcheol Kim, Jin-Gee Kang, U.
author_sort	Jeon, Hyojin
collection	PubMed
description	Given a large Transformer model, how can we obtain a small and computationally efficient model which maintains the performance of the original model? Transformer has shown significant performance improvements for many NLP tasks in recent years. However, their large size, expensive computational cost, and long inference time make it challenging to deploy them to resource-constrained devices. Existing Transformer compression methods mainly focus on reducing the size of the encoder ignoring the fact that the decoder takes the major portion of the long inference time. In this paper, we propose PET (Parameter-Efficient knowledge distillation on Transformer), an efficient Transformer compression method that reduces the size of both the encoder and decoder. In PET, we identify and exploit pairs of parameter groups for efficient weight sharing, and employ a warm-up process using a simplified task to increase the gain through Knowledge Distillation. Extensive experiments on five real-world datasets show that PET outperforms existing methods in machine translation tasks. Specifically, on the IWSLT’14 EN→DE task, PET reduces the memory usage by 81.20% and accelerates the inference speed by 45.15% compared to the uncompressed model, with a minor decrease in BLEU score of 0.27.
format	Online Article Text
id	pubmed-10325108
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-103251082023-07-07 PET: Parameter-efficient Knowledge Distillation on Transformer Jeon, Hyojin Park, Seungcheol Kim, Jin-Gee Kang, U. PLoS One Research Article Given a large Transformer model, how can we obtain a small and computationally efficient model which maintains the performance of the original model? Transformer has shown significant performance improvements for many NLP tasks in recent years. However, their large size, expensive computational cost, and long inference time make it challenging to deploy them to resource-constrained devices. Existing Transformer compression methods mainly focus on reducing the size of the encoder ignoring the fact that the decoder takes the major portion of the long inference time. In this paper, we propose PET (Parameter-Efficient knowledge distillation on Transformer), an efficient Transformer compression method that reduces the size of both the encoder and decoder. In PET, we identify and exploit pairs of parameter groups for efficient weight sharing, and employ a warm-up process using a simplified task to increase the gain through Knowledge Distillation. Extensive experiments on five real-world datasets show that PET outperforms existing methods in machine translation tasks. Specifically, on the IWSLT’14 EN→DE task, PET reduces the memory usage by 81.20% and accelerates the inference speed by 45.15% compared to the uncompressed model, with a minor decrease in BLEU score of 0.27. Public Library of Science 2023-07-06 /pmc/articles/PMC10325108/ /pubmed/37410716 http://dx.doi.org/10.1371/journal.pone.0288060 Text en © 2023 Jeon et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Jeon, Hyojin Park, Seungcheol Kim, Jin-Gee Kang, U. PET: Parameter-efficient Knowledge Distillation on Transformer
title	PET: Parameter-efficient Knowledge Distillation on Transformer
title_full	PET: Parameter-efficient Knowledge Distillation on Transformer
title_fullStr	PET: Parameter-efficient Knowledge Distillation on Transformer
title_full_unstemmed	PET: Parameter-efficient Knowledge Distillation on Transformer
title_short	PET: Parameter-efficient Knowledge Distillation on Transformer
title_sort	pet: parameter-efficient knowledge distillation on transformer
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10325108/ https://www.ncbi.nlm.nih.gov/pubmed/37410716 http://dx.doi.org/10.1371/journal.pone.0288060
work_keys_str_mv	AT jeonhyojin petparameterefficientknowledgedistillationontransformer AT parkseungcheol petparameterefficientknowledgedistillationontransformer AT kimjingee petparameterefficientknowledgedistillationontransformer AT kangu petparameterefficientknowledgedistillationontransformer

PET: Parameter-efficient Knowledge Distillation on Transformer

Ejemplares similares