Cargando…

What does Chinese BERT learn about syntactic knowledge?

Pre-trained language models such as Bidirectional Encoder Representations from Transformers (BERT) have been applied to a wide range of natural language processing (NLP) tasks and obtained significantly positive results. A growing body of research has investigated the reason why BERT is so efficient...

Descripción completa

Detalles Bibliográficos
Autores principales: Zheng, Jianyu, Liu, Ying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10403162/
https://www.ncbi.nlm.nih.gov/pubmed/37547407
http://dx.doi.org/10.7717/peerj-cs.1478
_version_ 1785085007044280320
author Zheng, Jianyu
Liu, Ying
author_facet Zheng, Jianyu
Liu, Ying
author_sort Zheng, Jianyu
collection PubMed
description Pre-trained language models such as Bidirectional Encoder Representations from Transformers (BERT) have been applied to a wide range of natural language processing (NLP) tasks and obtained significantly positive results. A growing body of research has investigated the reason why BERT is so efficient and what language knowledge BERT is able to learn. However, most of these works focused almost exclusively on English. Few studies have explored the language information, particularly syntactic information, that BERT has learned in Chinese, which is written as sequences of characters. In this study, we adopted some probing methods for identifying syntactic knowledge stored in the attention heads and hidden states of Chinese BERT. The results suggest that some individual heads and combination of heads do well in encoding corresponding and overall syntactic relations, respectively. The hidden representation of each layer also contained syntactic information to different degrees. We also analyzed the fine-tuned models of Chinese BERT for different tasks, covering all levels. Our results suggest that these fine-turned models reflect changes in conserving language structure. These findings help explain why Chinese BERT can show such large improvements across many language-processing tasks.
format Online
Article
Text
id pubmed-10403162
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-104031622023-08-05 What does Chinese BERT learn about syntactic knowledge? Zheng, Jianyu Liu, Ying PeerJ Comput Sci Artificial Intelligence Pre-trained language models such as Bidirectional Encoder Representations from Transformers (BERT) have been applied to a wide range of natural language processing (NLP) tasks and obtained significantly positive results. A growing body of research has investigated the reason why BERT is so efficient and what language knowledge BERT is able to learn. However, most of these works focused almost exclusively on English. Few studies have explored the language information, particularly syntactic information, that BERT has learned in Chinese, which is written as sequences of characters. In this study, we adopted some probing methods for identifying syntactic knowledge stored in the attention heads and hidden states of Chinese BERT. The results suggest that some individual heads and combination of heads do well in encoding corresponding and overall syntactic relations, respectively. The hidden representation of each layer also contained syntactic information to different degrees. We also analyzed the fine-tuned models of Chinese BERT for different tasks, covering all levels. Our results suggest that these fine-turned models reflect changes in conserving language structure. These findings help explain why Chinese BERT can show such large improvements across many language-processing tasks. PeerJ Inc. 2023-07-26 /pmc/articles/PMC10403162/ /pubmed/37547407 http://dx.doi.org/10.7717/peerj-cs.1478 Text en ©2023 Zheng and Liu https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Artificial Intelligence
Zheng, Jianyu
Liu, Ying
What does Chinese BERT learn about syntactic knowledge?
title What does Chinese BERT learn about syntactic knowledge?
title_full What does Chinese BERT learn about syntactic knowledge?
title_fullStr What does Chinese BERT learn about syntactic knowledge?
title_full_unstemmed What does Chinese BERT learn about syntactic knowledge?
title_short What does Chinese BERT learn about syntactic knowledge?
title_sort what does chinese bert learn about syntactic knowledge?
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10403162/
https://www.ncbi.nlm.nih.gov/pubmed/37547407
http://dx.doi.org/10.7717/peerj-cs.1478
work_keys_str_mv AT zhengjianyu whatdoeschinesebertlearnaboutsyntacticknowledge
AT liuying whatdoeschinesebertlearnaboutsyntacticknowledge