Cargando…

Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large‐Scale Text Corpora

Applying machine learning algorithms to automatically infer relationships between concepts from large‐scale collections of documents presents a unique opportunity to investigate at scale how human semantic knowledge is organized, how people use it to make fundamental judgments (“How similar are cats...

Descripción completa

Detalles Bibliográficos
Autores principales:	Iordan, Marius Cătălin, Giallanza, Tyler, Ellis, Cameron T., Beckage, Nicole M., Cohen, Jonathan D.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2022
Materias:	Regular Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9285590/ https://www.ncbi.nlm.nih.gov/pubmed/35146779 http://dx.doi.org/10.1111/cogs.13085

_version_	1784747818895802368
author	Iordan, Marius Cătălin Giallanza, Tyler Ellis, Cameron T. Beckage, Nicole M. Cohen, Jonathan D.
author_facet	Iordan, Marius Cătălin Giallanza, Tyler Ellis, Cameron T. Beckage, Nicole M. Cohen, Jonathan D.
author_sort	Iordan, Marius Cătălin
collection	PubMed
description	Applying machine learning algorithms to automatically infer relationships between concepts from large‐scale collections of documents presents a unique opportunity to investigate at scale how human semantic knowledge is organized, how people use it to make fundamental judgments (“How similar are cats and bears?”), and how these judgments depend on the features that describe concepts (e.g., size, furriness). However, efforts to date have exhibited a substantial discrepancy between algorithm predictions and human empirical judgments. Here, we introduce a novel approach to generating embeddings for this purpose motivated by the idea that semantic context plays a critical role in human judgment. We leverage this idea by constraining the topic or domain from which documents used for generating embeddings are drawn (e.g., referring to the natural world vs. transportation apparatus). Specifically, we trained state‐of‐the‐art machine learning algorithms using contextually‐constrained text corpora (domain‐specific subsets of Wikipedia articles, 50+ million words each) and showed that this procedure greatly improved predictions of empirical similarity judgments and feature ratings of contextually relevant concepts. Furthermore, we describe a novel, computationally tractable method for improving predictions of contextually‐unconstrained embedding models based on dimensionality reduction of their internal representation to a small number of contextually relevant semantic features. By improving the correspondence between predictions derived automatically by machine learning methods using vast amounts of data and more limited, but direct empirical measurements of human judgments, our approach may help leverage the availability of online corpora to better understand the structure of human semantic representations and how people make judgments based on those.
format	Online Article Text
id	pubmed-9285590
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-92855902022-07-18 Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large‐Scale Text Corpora Iordan, Marius Cătălin Giallanza, Tyler Ellis, Cameron T. Beckage, Nicole M. Cohen, Jonathan D. Cogn Sci Regular Articles Applying machine learning algorithms to automatically infer relationships between concepts from large‐scale collections of documents presents a unique opportunity to investigate at scale how human semantic knowledge is organized, how people use it to make fundamental judgments (“How similar are cats and bears?”), and how these judgments depend on the features that describe concepts (e.g., size, furriness). However, efforts to date have exhibited a substantial discrepancy between algorithm predictions and human empirical judgments. Here, we introduce a novel approach to generating embeddings for this purpose motivated by the idea that semantic context plays a critical role in human judgment. We leverage this idea by constraining the topic or domain from which documents used for generating embeddings are drawn (e.g., referring to the natural world vs. transportation apparatus). Specifically, we trained state‐of‐the‐art machine learning algorithms using contextually‐constrained text corpora (domain‐specific subsets of Wikipedia articles, 50+ million words each) and showed that this procedure greatly improved predictions of empirical similarity judgments and feature ratings of contextually relevant concepts. Furthermore, we describe a novel, computationally tractable method for improving predictions of contextually‐unconstrained embedding models based on dimensionality reduction of their internal representation to a small number of contextually relevant semantic features. By improving the correspondence between predictions derived automatically by machine learning methods using vast amounts of data and more limited, but direct empirical measurements of human judgments, our approach may help leverage the availability of online corpora to better understand the structure of human semantic representations and how people make judgments based on those. John Wiley and Sons Inc. 2022-02-11 2022-02 /pmc/articles/PMC9285590/ /pubmed/35146779 http://dx.doi.org/10.1111/cogs.13085 Text en © 2022 The Authors. Cognitive Science published by Wiley Periodicals LLC on behalf of Cognitive Science Society (CSS). https://creativecommons.org/licenses/by-nc/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle	Regular Articles Iordan, Marius Cătălin Giallanza, Tyler Ellis, Cameron T. Beckage, Nicole M. Cohen, Jonathan D. Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large‐Scale Text Corpora
title	Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large‐Scale Text Corpora
title_full	Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large‐Scale Text Corpora
title_fullStr	Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large‐Scale Text Corpora
title_full_unstemmed	Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large‐Scale Text Corpora
title_short	Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large‐Scale Text Corpora
title_sort	context matters: recovering human semantic structure from machine learning analysis of large‐scale text corpora
topic	Regular Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9285590/ https://www.ncbi.nlm.nih.gov/pubmed/35146779 http://dx.doi.org/10.1111/cogs.13085
work_keys_str_mv	AT iordanmariuscatalin contextmattersrecoveringhumansemanticstructurefrommachinelearninganalysisoflargescaletextcorpora AT giallanzatyler contextmattersrecoveringhumansemanticstructurefrommachinelearninganalysisoflargescaletextcorpora AT elliscameront contextmattersrecoveringhumansemanticstructurefrommachinelearninganalysisoflargescaletextcorpora AT beckagenicolem contextmattersrecoveringhumansemanticstructurefrommachinelearninganalysisoflargescaletextcorpora AT cohenjonathand contextmattersrecoveringhumansemanticstructurefrommachinelearninganalysisoflargescaletextcorpora

Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large‐Scale Text Corpora

Ejemplares similares