Cargando…

CLAD: A corpus-derived Chinese Lexical Association Database

The application of word associations has become increasingly widespread. However, the association norms produced by traditional free association tests tend not to exceed 10,000 stimulus words, making the number of associated words too small to be representative of the overall language. In this study...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lin, Shu-Yen, Chen, Hsueh-Chih, Chang, Tao-Hsing, Lee, Wei-En, Sung, Yao-Ting
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6797702/ https://www.ncbi.nlm.nih.gov/pubmed/31429062 http://dx.doi.org/10.3758/s13428-019-01208-2

_version_	1783459888031072256
author	Lin, Shu-Yen Chen, Hsueh-Chih Chang, Tao-Hsing Lee, Wei-En Sung, Yao-Ting
author_facet	Lin, Shu-Yen Chen, Hsueh-Chih Chang, Tao-Hsing Lee, Wei-En Sung, Yao-Ting
author_sort	Lin, Shu-Yen
collection	PubMed
description	The application of word associations has become increasingly widespread. However, the association norms produced by traditional free association tests tend not to exceed 10,000 stimulus words, making the number of associated words too small to be representative of the overall language. In this study we used text corpora totaling over 400 million Chinese words, along with a multitude of association measures, to automatically construct a Chinese Lexical Association Database (CLAD) comprising the lexical association of over 80,000 words. Comparison of the CLAD with a database of traditional Chinese word association norms shows that word associations extracted from large text corpora are similar in strength to those elicited from free association tests but contain a much greater number of associative word pairs. Additionally, the relatively small numbers of participants involved in the creation of traditional norms result in relatively coarse scales of association measurement, whereas the differentiation of association strengths is greatly enhanced in the CLAD. The CLAD provides researchers with a great supplement to traditional word association norms. A query website at www.chinesereadability.net/LexicalAssociation/CLAD/ affords access to the database.
format	Online Article Text
id	pubmed-6797702
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-67977022019-11-01 CLAD: A corpus-derived Chinese Lexical Association Database Lin, Shu-Yen Chen, Hsueh-Chih Chang, Tao-Hsing Lee, Wei-En Sung, Yao-Ting Behav Res Methods Article The application of word associations has become increasingly widespread. However, the association norms produced by traditional free association tests tend not to exceed 10,000 stimulus words, making the number of associated words too small to be representative of the overall language. In this study we used text corpora totaling over 400 million Chinese words, along with a multitude of association measures, to automatically construct a Chinese Lexical Association Database (CLAD) comprising the lexical association of over 80,000 words. Comparison of the CLAD with a database of traditional Chinese word association norms shows that word associations extracted from large text corpora are similar in strength to those elicited from free association tests but contain a much greater number of associative word pairs. Additionally, the relatively small numbers of participants involved in the creation of traditional norms result in relatively coarse scales of association measurement, whereas the differentiation of association strengths is greatly enhanced in the CLAD. The CLAD provides researchers with a great supplement to traditional word association norms. A query website at www.chinesereadability.net/LexicalAssociation/CLAD/ affords access to the database. Springer US 2019-08-19 2019 /pmc/articles/PMC6797702/ /pubmed/31429062 http://dx.doi.org/10.3758/s13428-019-01208-2 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Article Lin, Shu-Yen Chen, Hsueh-Chih Chang, Tao-Hsing Lee, Wei-En Sung, Yao-Ting CLAD: A corpus-derived Chinese Lexical Association Database
title	CLAD: A corpus-derived Chinese Lexical Association Database
title_full	CLAD: A corpus-derived Chinese Lexical Association Database
title_fullStr	CLAD: A corpus-derived Chinese Lexical Association Database
title_full_unstemmed	CLAD: A corpus-derived Chinese Lexical Association Database
title_short	CLAD: A corpus-derived Chinese Lexical Association Database
title_sort	clad: a corpus-derived chinese lexical association database
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6797702/ https://www.ncbi.nlm.nih.gov/pubmed/31429062 http://dx.doi.org/10.3758/s13428-019-01208-2
work_keys_str_mv	AT linshuyen cladacorpusderivedchineselexicalassociationdatabase AT chenhsuehchih cladacorpusderivedchineselexicalassociationdatabase AT changtaohsing cladacorpusderivedchineselexicalassociationdatabase AT leeweien cladacorpusderivedchineselexicalassociationdatabase AT sungyaoting cladacorpusderivedchineselexicalassociationdatabase

CLAD: A corpus-derived Chinese Lexical Association Database

Ejemplares similares