Cargando…

A large dataset of semantic ratings and its computational extension

Evidence from psychology and cognitive neuroscience indicates that the human brain’s semantic system contains several specific subsystems, each representing a particular dimension of semantic information. Word ratings on these different semantic dimensions can help investigate the behavioral and neu...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Shaonan, Zhang, Yunhao, Shi, Weiting, Zhang, Guangyao, Zhang, Jiajun, Lin, Nan, Zong, Chengqing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9950052/
https://www.ncbi.nlm.nih.gov/pubmed/36823158
http://dx.doi.org/10.1038/s41597-023-01995-6
_version_ 1784893078636593152
author Wang, Shaonan
Zhang, Yunhao
Shi, Weiting
Zhang, Guangyao
Zhang, Jiajun
Lin, Nan
Zong, Chengqing
author_facet Wang, Shaonan
Zhang, Yunhao
Shi, Weiting
Zhang, Guangyao
Zhang, Jiajun
Lin, Nan
Zong, Chengqing
author_sort Wang, Shaonan
collection PubMed
description Evidence from psychology and cognitive neuroscience indicates that the human brain’s semantic system contains several specific subsystems, each representing a particular dimension of semantic information. Word ratings on these different semantic dimensions can help investigate the behavioral and neural impacts of semantic dimensions on language processes and build computational representations of language meaning according to the semantic space of the human cognitive system. Existing semantic rating databases provide ratings for hundreds to thousands of words, which can hardly support a comprehensive semantic analysis of natural texts or speech. This article reports a large database, the Six Semantic Dimension Database (SSDD), which contains subjective ratings for 17,940 commonly used Chinese words on six major semantic dimensions: vision, motor, socialness, emotion, time, and space. Furthermore, using computational models to learn the mapping relations between subjective ratings and word embeddings, we include the estimated semantic ratings for 1,427,992 Chinese and 1,515,633 English words in the SSDD. The SSDD will aid studies on natural language processing, text analysis, and semantic representation in the brain.
format Online
Article
Text
id pubmed-9950052
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-99500522023-02-25 A large dataset of semantic ratings and its computational extension Wang, Shaonan Zhang, Yunhao Shi, Weiting Zhang, Guangyao Zhang, Jiajun Lin, Nan Zong, Chengqing Sci Data Data Descriptor Evidence from psychology and cognitive neuroscience indicates that the human brain’s semantic system contains several specific subsystems, each representing a particular dimension of semantic information. Word ratings on these different semantic dimensions can help investigate the behavioral and neural impacts of semantic dimensions on language processes and build computational representations of language meaning according to the semantic space of the human cognitive system. Existing semantic rating databases provide ratings for hundreds to thousands of words, which can hardly support a comprehensive semantic analysis of natural texts or speech. This article reports a large database, the Six Semantic Dimension Database (SSDD), which contains subjective ratings for 17,940 commonly used Chinese words on six major semantic dimensions: vision, motor, socialness, emotion, time, and space. Furthermore, using computational models to learn the mapping relations between subjective ratings and word embeddings, we include the estimated semantic ratings for 1,427,992 Chinese and 1,515,633 English words in the SSDD. The SSDD will aid studies on natural language processing, text analysis, and semantic representation in the brain. Nature Publishing Group UK 2023-02-23 /pmc/articles/PMC9950052/ /pubmed/36823158 http://dx.doi.org/10.1038/s41597-023-01995-6 Text en © The Author(s) 2023, corrected publication 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Data Descriptor
Wang, Shaonan
Zhang, Yunhao
Shi, Weiting
Zhang, Guangyao
Zhang, Jiajun
Lin, Nan
Zong, Chengqing
A large dataset of semantic ratings and its computational extension
title A large dataset of semantic ratings and its computational extension
title_full A large dataset of semantic ratings and its computational extension
title_fullStr A large dataset of semantic ratings and its computational extension
title_full_unstemmed A large dataset of semantic ratings and its computational extension
title_short A large dataset of semantic ratings and its computational extension
title_sort large dataset of semantic ratings and its computational extension
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9950052/
https://www.ncbi.nlm.nih.gov/pubmed/36823158
http://dx.doi.org/10.1038/s41597-023-01995-6
work_keys_str_mv AT wangshaonan alargedatasetofsemanticratingsanditscomputationalextension
AT zhangyunhao alargedatasetofsemanticratingsanditscomputationalextension
AT shiweiting alargedatasetofsemanticratingsanditscomputationalextension
AT zhangguangyao alargedatasetofsemanticratingsanditscomputationalextension
AT zhangjiajun alargedatasetofsemanticratingsanditscomputationalextension
AT linnan alargedatasetofsemanticratingsanditscomputationalextension
AT zongchengqing alargedatasetofsemanticratingsanditscomputationalextension
AT wangshaonan largedatasetofsemanticratingsanditscomputationalextension
AT zhangyunhao largedatasetofsemanticratingsanditscomputationalextension
AT shiweiting largedatasetofsemanticratingsanditscomputationalextension
AT zhangguangyao largedatasetofsemanticratingsanditscomputationalextension
AT zhangjiajun largedatasetofsemanticratingsanditscomputationalextension
AT linnan largedatasetofsemanticratingsanditscomputationalextension
AT zongchengqing largedatasetofsemanticratingsanditscomputationalextension