Cargando…

Building a Korean morphological analyzer using two Korean BERT models

A morphological analyzer plays an essential role in identifying functional suffixes of Korean words. The analyzer input and output differ from each other in their length and strings, which can be dealt with by an encoder-decoder architecture. We adopt a Transformer architecture, which is an encoder-...

Descripción completa

Detalles Bibliográficos
Autores principales:	Choi, Yong-Seok, Park, Yo-Han, Lee, Kong Joo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2022
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9137944/ https://www.ncbi.nlm.nih.gov/pubmed/35634098 http://dx.doi.org/10.7717/peerj-cs.968

_version_	1784714504432517120
author	Choi, Yong-Seok Park, Yo-Han Lee, Kong Joo
author_facet	Choi, Yong-Seok Park, Yo-Han Lee, Kong Joo
author_sort	Choi, Yong-Seok
collection	PubMed
description	A morphological analyzer plays an essential role in identifying functional suffixes of Korean words. The analyzer input and output differ from each other in their length and strings, which can be dealt with by an encoder-decoder architecture. We adopt a Transformer architecture, which is an encoder-decoder architecture with self-attention rather than a recurrent connection, to implement a Korean morphological analyzer. Bidirectional Encoder Representations from Transformers (BERT) is one of the most popular pretrained representation models; it can present an encoded sequence of input words, considering contextual information. We initialize both the Transformer encoder and decoder with two types of Korean BERT, one of which is pretrained with a raw corpus, and the other is pretrained with a morphologically analyzed dataset. Therefore, implementing a Korean morphological analyzer based on Transformer is a fine-tuning process with a relatively small corpus. A series of experiments proved that parameter initialization using pretrained models can alleviate the chronic problem of a lack of training data and reduce the time required for training. In addition, we can determine the number of layers required for the encoder and decoder to optimize the performance of a Korean morphological analyzer.
format	Online Article Text
id	pubmed-9137944
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-91379442022-05-28 Building a Korean morphological analyzer using two Korean BERT models Choi, Yong-Seok Park, Yo-Han Lee, Kong Joo PeerJ Comput Sci Artificial Intelligence A morphological analyzer plays an essential role in identifying functional suffixes of Korean words. The analyzer input and output differ from each other in their length and strings, which can be dealt with by an encoder-decoder architecture. We adopt a Transformer architecture, which is an encoder-decoder architecture with self-attention rather than a recurrent connection, to implement a Korean morphological analyzer. Bidirectional Encoder Representations from Transformers (BERT) is one of the most popular pretrained representation models; it can present an encoded sequence of input words, considering contextual information. We initialize both the Transformer encoder and decoder with two types of Korean BERT, one of which is pretrained with a raw corpus, and the other is pretrained with a morphologically analyzed dataset. Therefore, implementing a Korean morphological analyzer based on Transformer is a fine-tuning process with a relatively small corpus. A series of experiments proved that parameter initialization using pretrained models can alleviate the chronic problem of a lack of training data and reduce the time required for training. In addition, we can determine the number of layers required for the encoder and decoder to optimize the performance of a Korean morphological analyzer. PeerJ Inc. 2022-05-02 /pmc/articles/PMC9137944/ /pubmed/35634098 http://dx.doi.org/10.7717/peerj-cs.968 Text en ©2022 Choi et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Artificial Intelligence Choi, Yong-Seok Park, Yo-Han Lee, Kong Joo Building a Korean morphological analyzer using two Korean BERT models
title	Building a Korean morphological analyzer using two Korean BERT models
title_full	Building a Korean morphological analyzer using two Korean BERT models
title_fullStr	Building a Korean morphological analyzer using two Korean BERT models
title_full_unstemmed	Building a Korean morphological analyzer using two Korean BERT models
title_short	Building a Korean morphological analyzer using two Korean BERT models
title_sort	building a korean morphological analyzer using two korean bert models
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9137944/ https://www.ncbi.nlm.nih.gov/pubmed/35634098 http://dx.doi.org/10.7717/peerj-cs.968
work_keys_str_mv	AT choiyongseok buildingakoreanmorphologicalanalyzerusingtwokoreanbertmodels AT parkyohan buildingakoreanmorphologicalanalyzerusingtwokoreanbertmodels AT leekongjoo buildingakoreanmorphologicalanalyzerusingtwokoreanbertmodels

Building a Korean morphological analyzer using two Korean BERT models

Ejemplares similares