Cargando…
Collective self-understanding: A linguistic style analysis of naturally occurring text data
Understanding what groups stand for is integral to a diverse array of social processes, ranging from understanding political conflicts to organisational behaviour to promoting public health behaviours. Traditionally, researchers rely on self-report methods such as interviews and surveys to assess gr...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9707163/ https://www.ncbi.nlm.nih.gov/pubmed/36443583 http://dx.doi.org/10.3758/s13428-022-02027-8 |
_version_ | 1784840660064403456 |
---|---|
author | Cork, Alicia Everson, Richard Naserian, Elahe Levine, Mark Koschate-Reis, Miriam |
author_facet | Cork, Alicia Everson, Richard Naserian, Elahe Levine, Mark Koschate-Reis, Miriam |
author_sort | Cork, Alicia |
collection | PubMed |
description | Understanding what groups stand for is integral to a diverse array of social processes, ranging from understanding political conflicts to organisational behaviour to promoting public health behaviours. Traditionally, researchers rely on self-report methods such as interviews and surveys to assess groups’ collective self-understandings. Here, we demonstrate the value of using naturally occurring online textual data to map the similarities and differences between real-world groups’ collective self-understandings. We use machine learning algorithms to assess similarities between 15 diverse online groups’ linguistic style, and then use multidimensional scaling to map the groups in two-dimensonal space (N=1,779,098 Reddit comments). We then use agglomerative and k-means clustering techniques to assess how the 15 groups cluster, finding there are four behaviourally distinct group types – vocational, collective action (comprising political and ethnic/religious identities), relational and stigmatised groups, with stigmatised groups having a less distinctive behavioural profile than the other group types. Study 2 is a secondary data analysis where we find strong relationships between the coordinates of each group in multidimensional space and the groups’ values. In Study 3, we demonstrate how this approach can be used to track the development of groups’ collective self-understandings over time. Using transgender Reddit data (N= 1,095,620 comments) as a proof-of-concept, we track the gradual politicisation of the transgender group over the past decade. The automaticity of this methodology renders it advantageous for monitoring multiple online groups simultaneously. This approach has implications for both governmental agencies and social researchers more generally. Future research avenues and applications are discussed. |
format | Online Article Text |
id | pubmed-9707163 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer US |
record_format | MEDLINE/PubMed |
spelling | pubmed-97071632022-11-29 Collective self-understanding: A linguistic style analysis of naturally occurring text data Cork, Alicia Everson, Richard Naserian, Elahe Levine, Mark Koschate-Reis, Miriam Behav Res Methods Article Understanding what groups stand for is integral to a diverse array of social processes, ranging from understanding political conflicts to organisational behaviour to promoting public health behaviours. Traditionally, researchers rely on self-report methods such as interviews and surveys to assess groups’ collective self-understandings. Here, we demonstrate the value of using naturally occurring online textual data to map the similarities and differences between real-world groups’ collective self-understandings. We use machine learning algorithms to assess similarities between 15 diverse online groups’ linguistic style, and then use multidimensional scaling to map the groups in two-dimensonal space (N=1,779,098 Reddit comments). We then use agglomerative and k-means clustering techniques to assess how the 15 groups cluster, finding there are four behaviourally distinct group types – vocational, collective action (comprising political and ethnic/religious identities), relational and stigmatised groups, with stigmatised groups having a less distinctive behavioural profile than the other group types. Study 2 is a secondary data analysis where we find strong relationships between the coordinates of each group in multidimensional space and the groups’ values. In Study 3, we demonstrate how this approach can be used to track the development of groups’ collective self-understandings over time. Using transgender Reddit data (N= 1,095,620 comments) as a proof-of-concept, we track the gradual politicisation of the transgender group over the past decade. The automaticity of this methodology renders it advantageous for monitoring multiple online groups simultaneously. This approach has implications for both governmental agencies and social researchers more generally. Future research avenues and applications are discussed. Springer US 2022-11-28 /pmc/articles/PMC9707163/ /pubmed/36443583 http://dx.doi.org/10.3758/s13428-022-02027-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Cork, Alicia Everson, Richard Naserian, Elahe Levine, Mark Koschate-Reis, Miriam Collective self-understanding: A linguistic style analysis of naturally occurring text data |
title | Collective self-understanding: A linguistic style analysis of naturally occurring text data |
title_full | Collective self-understanding: A linguistic style analysis of naturally occurring text data |
title_fullStr | Collective self-understanding: A linguistic style analysis of naturally occurring text data |
title_full_unstemmed | Collective self-understanding: A linguistic style analysis of naturally occurring text data |
title_short | Collective self-understanding: A linguistic style analysis of naturally occurring text data |
title_sort | collective self-understanding: a linguistic style analysis of naturally occurring text data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9707163/ https://www.ncbi.nlm.nih.gov/pubmed/36443583 http://dx.doi.org/10.3758/s13428-022-02027-8 |
work_keys_str_mv | AT corkalicia collectiveselfunderstandingalinguisticstyleanalysisofnaturallyoccurringtextdata AT eversonrichard collectiveselfunderstandingalinguisticstyleanalysisofnaturallyoccurringtextdata AT naserianelahe collectiveselfunderstandingalinguisticstyleanalysisofnaturallyoccurringtextdata AT levinemark collectiveselfunderstandingalinguisticstyleanalysisofnaturallyoccurringtextdata AT koschatereismiriam collectiveselfunderstandingalinguisticstyleanalysisofnaturallyoccurringtextdata |