Cargando…
What does Chinese BERT learn about syntactic knowledge?
Pre-trained language models such as Bidirectional Encoder Representations from Transformers (BERT) have been applied to a wide range of natural language processing (NLP) tasks and obtained significantly positive results. A growing body of research has investigated the reason why BERT is so efficient...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10403162/ https://www.ncbi.nlm.nih.gov/pubmed/37547407 http://dx.doi.org/10.7717/peerj-cs.1478 |
_version_ | 1785085007044280320 |
---|---|
author | Zheng, Jianyu Liu, Ying |
author_facet | Zheng, Jianyu Liu, Ying |
author_sort | Zheng, Jianyu |
collection | PubMed |
description | Pre-trained language models such as Bidirectional Encoder Representations from Transformers (BERT) have been applied to a wide range of natural language processing (NLP) tasks and obtained significantly positive results. A growing body of research has investigated the reason why BERT is so efficient and what language knowledge BERT is able to learn. However, most of these works focused almost exclusively on English. Few studies have explored the language information, particularly syntactic information, that BERT has learned in Chinese, which is written as sequences of characters. In this study, we adopted some probing methods for identifying syntactic knowledge stored in the attention heads and hidden states of Chinese BERT. The results suggest that some individual heads and combination of heads do well in encoding corresponding and overall syntactic relations, respectively. The hidden representation of each layer also contained syntactic information to different degrees. We also analyzed the fine-tuned models of Chinese BERT for different tasks, covering all levels. Our results suggest that these fine-turned models reflect changes in conserving language structure. These findings help explain why Chinese BERT can show such large improvements across many language-processing tasks. |
format | Online Article Text |
id | pubmed-10403162 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-104031622023-08-05 What does Chinese BERT learn about syntactic knowledge? Zheng, Jianyu Liu, Ying PeerJ Comput Sci Artificial Intelligence Pre-trained language models such as Bidirectional Encoder Representations from Transformers (BERT) have been applied to a wide range of natural language processing (NLP) tasks and obtained significantly positive results. A growing body of research has investigated the reason why BERT is so efficient and what language knowledge BERT is able to learn. However, most of these works focused almost exclusively on English. Few studies have explored the language information, particularly syntactic information, that BERT has learned in Chinese, which is written as sequences of characters. In this study, we adopted some probing methods for identifying syntactic knowledge stored in the attention heads and hidden states of Chinese BERT. The results suggest that some individual heads and combination of heads do well in encoding corresponding and overall syntactic relations, respectively. The hidden representation of each layer also contained syntactic information to different degrees. We also analyzed the fine-tuned models of Chinese BERT for different tasks, covering all levels. Our results suggest that these fine-turned models reflect changes in conserving language structure. These findings help explain why Chinese BERT can show such large improvements across many language-processing tasks. PeerJ Inc. 2023-07-26 /pmc/articles/PMC10403162/ /pubmed/37547407 http://dx.doi.org/10.7717/peerj-cs.1478 Text en ©2023 Zheng and Liu https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Artificial Intelligence Zheng, Jianyu Liu, Ying What does Chinese BERT learn about syntactic knowledge? |
title | What does Chinese BERT learn about syntactic knowledge? |
title_full | What does Chinese BERT learn about syntactic knowledge? |
title_fullStr | What does Chinese BERT learn about syntactic knowledge? |
title_full_unstemmed | What does Chinese BERT learn about syntactic knowledge? |
title_short | What does Chinese BERT learn about syntactic knowledge? |
title_sort | what does chinese bert learn about syntactic knowledge? |
topic | Artificial Intelligence |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10403162/ https://www.ncbi.nlm.nih.gov/pubmed/37547407 http://dx.doi.org/10.7717/peerj-cs.1478 |
work_keys_str_mv | AT zhengjianyu whatdoeschinesebertlearnaboutsyntacticknowledge AT liuying whatdoeschinesebertlearnaboutsyntacticknowledge |