Cargando…

Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a)

We comment on Eichstaedt et al.’s (2015a) claim to have shown that language patterns among Twitter users, aggregated at the level of US counties, predicted county-level mortality rates from atherosclerotic heart disease (AHD), with “negative” language being associated with higher rates of death from...

Descripción completa

Detalles Bibliográficos
Autores principales: Brown, Nicholas J.L., Coyne, James C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6152451/
https://www.ncbi.nlm.nih.gov/pubmed/30258732
http://dx.doi.org/10.7717/peerj.5656
_version_ 1783357367660838912
author Brown, Nicholas J.L.
Coyne, James C.
author_facet Brown, Nicholas J.L.
Coyne, James C.
author_sort Brown, Nicholas J.L.
collection PubMed
description We comment on Eichstaedt et al.’s (2015a) claim to have shown that language patterns among Twitter users, aggregated at the level of US counties, predicted county-level mortality rates from atherosclerotic heart disease (AHD), with “negative” language being associated with higher rates of death from AHD and “positive” language associated with lower rates. First, we examine some of Eichstaedt et al.’s apparent assumptions about the nature of AHD, as well as some issues related to the secondary analysis of online data and to considering counties as communities. Next, using the data files supplied by Eichstaedt et al., we reproduce their regression- and correlation-based models, substituting mortality from an alternative cause of death—namely, suicide—as the outcome variable, and observe that the purported associations between “negative” and “positive” language and mortality are reversed when suicide is used as the outcome variable. We identify numerous other conceptual and methodological limitations that call into question the robustness and generalizability of Eichstaedt et al.’s claims, even when these are based on the results of their ridge regression/machine learning model. We conclude that there is no good evidence that analyzing Twitter data in bulk in this way can add anything useful to our ability to understand geographical variation in AHD mortality rates.
format Online
Article
Text
id pubmed-6152451
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-61524512018-09-26 Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a) Brown, Nicholas J.L. Coyne, James C. PeerJ Epidemiology We comment on Eichstaedt et al.’s (2015a) claim to have shown that language patterns among Twitter users, aggregated at the level of US counties, predicted county-level mortality rates from atherosclerotic heart disease (AHD), with “negative” language being associated with higher rates of death from AHD and “positive” language associated with lower rates. First, we examine some of Eichstaedt et al.’s apparent assumptions about the nature of AHD, as well as some issues related to the secondary analysis of online data and to considering counties as communities. Next, using the data files supplied by Eichstaedt et al., we reproduce their regression- and correlation-based models, substituting mortality from an alternative cause of death—namely, suicide—as the outcome variable, and observe that the purported associations between “negative” and “positive” language and mortality are reversed when suicide is used as the outcome variable. We identify numerous other conceptual and methodological limitations that call into question the robustness and generalizability of Eichstaedt et al.’s claims, even when these are based on the results of their ridge regression/machine learning model. We conclude that there is no good evidence that analyzing Twitter data in bulk in this way can add anything useful to our ability to understand geographical variation in AHD mortality rates. PeerJ Inc. 2018-09-21 /pmc/articles/PMC6152451/ /pubmed/30258732 http://dx.doi.org/10.7717/peerj.5656 Text en ©2018 Brown and Coyne http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Epidemiology
Brown, Nicholas J.L.
Coyne, James C.
Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a)
title Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a)
title_full Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a)
title_fullStr Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a)
title_full_unstemmed Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a)
title_short Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a)
title_sort does twitter language reliably predict heart disease? a commentary on eichstaedt et al. (2015a)
topic Epidemiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6152451/
https://www.ncbi.nlm.nih.gov/pubmed/30258732
http://dx.doi.org/10.7717/peerj.5656
work_keys_str_mv AT brownnicholasjl doestwitterlanguagereliablypredictheartdiseaseacommentaryoneichstaedtetal2015a
AT coynejamesc doestwitterlanguagereliablypredictheartdiseaseacommentaryoneichstaedtetal2015a