Cargando…

Pride, Love, and Twitter Rants: Combining Machine Learning and Qualitative Techniques to Understand What Our Tweets Reveal about Race in the US

Objective: Describe variation in sentiment of tweets using race-related terms and identify themes characterizing the social climate related to race. Methods: We applied a Stochastic Gradient Descent Classifier to conduct sentiment analysis of 1,249,653 US tweets using race-related terms from 2015–20...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Thu T., Criss, Shaniece, Allen, Amani M., Glymour, M. Maria, Phan, Lynn, Trevino, Ryan, Dasari, Shrikha, Nguyen, Quynh C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6571562/
https://www.ncbi.nlm.nih.gov/pubmed/31109051
http://dx.doi.org/10.3390/ijerph16101766
_version_ 1783427437509476352
author Nguyen, Thu T.
Criss, Shaniece
Allen, Amani M.
Glymour, M. Maria
Phan, Lynn
Trevino, Ryan
Dasari, Shrikha
Nguyen, Quynh C.
author_facet Nguyen, Thu T.
Criss, Shaniece
Allen, Amani M.
Glymour, M. Maria
Phan, Lynn
Trevino, Ryan
Dasari, Shrikha
Nguyen, Quynh C.
author_sort Nguyen, Thu T.
collection PubMed
description Objective: Describe variation in sentiment of tweets using race-related terms and identify themes characterizing the social climate related to race. Methods: We applied a Stochastic Gradient Descent Classifier to conduct sentiment analysis of 1,249,653 US tweets using race-related terms from 2015–2016. To evaluate accuracy, manual labels were compared against computer labels for a random subset of 6600 tweets. We conducted qualitative content analysis on a random sample of 2100 tweets. Results: Agreement between computer labels and manual labels was 74%. Tweets referencing Middle Eastern groups (12.5%) or Blacks (13.8%) had the lowest positive sentiment compared to tweets referencing Asians (17.7%) and Hispanics (17.5%). Qualitative content analysis revealed most tweets were represented by the categories: negative sentiment (45%), positive sentiment such as pride in culture (25%), and navigating relationships (15%). While all tweets use one or more race-related terms, negative sentiment tweets which were not derogatory or whose central topic was not about race were common. Conclusion: This study harnesses relatively untapped social media data to develop a novel area-level measure of social context (sentiment scores) and highlights some of the challenges in doing this work. New approaches to measuring the social environment may enhance research on social context and health.
format Online
Article
Text
id pubmed-6571562
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-65715622019-06-18 Pride, Love, and Twitter Rants: Combining Machine Learning and Qualitative Techniques to Understand What Our Tweets Reveal about Race in the US Nguyen, Thu T. Criss, Shaniece Allen, Amani M. Glymour, M. Maria Phan, Lynn Trevino, Ryan Dasari, Shrikha Nguyen, Quynh C. Int J Environ Res Public Health Article Objective: Describe variation in sentiment of tweets using race-related terms and identify themes characterizing the social climate related to race. Methods: We applied a Stochastic Gradient Descent Classifier to conduct sentiment analysis of 1,249,653 US tweets using race-related terms from 2015–2016. To evaluate accuracy, manual labels were compared against computer labels for a random subset of 6600 tweets. We conducted qualitative content analysis on a random sample of 2100 tweets. Results: Agreement between computer labels and manual labels was 74%. Tweets referencing Middle Eastern groups (12.5%) or Blacks (13.8%) had the lowest positive sentiment compared to tweets referencing Asians (17.7%) and Hispanics (17.5%). Qualitative content analysis revealed most tweets were represented by the categories: negative sentiment (45%), positive sentiment such as pride in culture (25%), and navigating relationships (15%). While all tweets use one or more race-related terms, negative sentiment tweets which were not derogatory or whose central topic was not about race were common. Conclusion: This study harnesses relatively untapped social media data to develop a novel area-level measure of social context (sentiment scores) and highlights some of the challenges in doing this work. New approaches to measuring the social environment may enhance research on social context and health. MDPI 2019-05-18 2019-05 /pmc/articles/PMC6571562/ /pubmed/31109051 http://dx.doi.org/10.3390/ijerph16101766 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Nguyen, Thu T.
Criss, Shaniece
Allen, Amani M.
Glymour, M. Maria
Phan, Lynn
Trevino, Ryan
Dasari, Shrikha
Nguyen, Quynh C.
Pride, Love, and Twitter Rants: Combining Machine Learning and Qualitative Techniques to Understand What Our Tweets Reveal about Race in the US
title Pride, Love, and Twitter Rants: Combining Machine Learning and Qualitative Techniques to Understand What Our Tweets Reveal about Race in the US
title_full Pride, Love, and Twitter Rants: Combining Machine Learning and Qualitative Techniques to Understand What Our Tweets Reveal about Race in the US
title_fullStr Pride, Love, and Twitter Rants: Combining Machine Learning and Qualitative Techniques to Understand What Our Tweets Reveal about Race in the US
title_full_unstemmed Pride, Love, and Twitter Rants: Combining Machine Learning and Qualitative Techniques to Understand What Our Tweets Reveal about Race in the US
title_short Pride, Love, and Twitter Rants: Combining Machine Learning and Qualitative Techniques to Understand What Our Tweets Reveal about Race in the US
title_sort pride, love, and twitter rants: combining machine learning and qualitative techniques to understand what our tweets reveal about race in the us
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6571562/
https://www.ncbi.nlm.nih.gov/pubmed/31109051
http://dx.doi.org/10.3390/ijerph16101766
work_keys_str_mv AT nguyenthut prideloveandtwitterrantscombiningmachinelearningandqualitativetechniquestounderstandwhatourtweetsrevealaboutraceintheus
AT crissshaniece prideloveandtwitterrantscombiningmachinelearningandqualitativetechniquestounderstandwhatourtweetsrevealaboutraceintheus
AT allenamanim prideloveandtwitterrantscombiningmachinelearningandqualitativetechniquestounderstandwhatourtweetsrevealaboutraceintheus
AT glymourmmaria prideloveandtwitterrantscombiningmachinelearningandqualitativetechniquestounderstandwhatourtweetsrevealaboutraceintheus
AT phanlynn prideloveandtwitterrantscombiningmachinelearningandqualitativetechniquestounderstandwhatourtweetsrevealaboutraceintheus
AT trevinoryan prideloveandtwitterrantscombiningmachinelearningandqualitativetechniquestounderstandwhatourtweetsrevealaboutraceintheus
AT dasarishrikha prideloveandtwitterrantscombiningmachinelearningandqualitativetechniquestounderstandwhatourtweetsrevealaboutraceintheus
AT nguyenquynhc prideloveandtwitterrantscombiningmachinelearningandqualitativetechniquestounderstandwhatourtweetsrevealaboutraceintheus