Cargando…
Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a)
We comment on Eichstaedt et al.’s (2015a) claim to have shown that language patterns among Twitter users, aggregated at the level of US counties, predicted county-level mortality rates from atherosclerotic heart disease (AHD), with “negative” language being associated with higher rates of death from...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6152451/ https://www.ncbi.nlm.nih.gov/pubmed/30258732 http://dx.doi.org/10.7717/peerj.5656 |
_version_ | 1783357367660838912 |
---|---|
author | Brown, Nicholas J.L. Coyne, James C. |
author_facet | Brown, Nicholas J.L. Coyne, James C. |
author_sort | Brown, Nicholas J.L. |
collection | PubMed |
description | We comment on Eichstaedt et al.’s (2015a) claim to have shown that language patterns among Twitter users, aggregated at the level of US counties, predicted county-level mortality rates from atherosclerotic heart disease (AHD), with “negative” language being associated with higher rates of death from AHD and “positive” language associated with lower rates. First, we examine some of Eichstaedt et al.’s apparent assumptions about the nature of AHD, as well as some issues related to the secondary analysis of online data and to considering counties as communities. Next, using the data files supplied by Eichstaedt et al., we reproduce their regression- and correlation-based models, substituting mortality from an alternative cause of death—namely, suicide—as the outcome variable, and observe that the purported associations between “negative” and “positive” language and mortality are reversed when suicide is used as the outcome variable. We identify numerous other conceptual and methodological limitations that call into question the robustness and generalizability of Eichstaedt et al.’s claims, even when these are based on the results of their ridge regression/machine learning model. We conclude that there is no good evidence that analyzing Twitter data in bulk in this way can add anything useful to our ability to understand geographical variation in AHD mortality rates. |
format | Online Article Text |
id | pubmed-6152451 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-61524512018-09-26 Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a) Brown, Nicholas J.L. Coyne, James C. PeerJ Epidemiology We comment on Eichstaedt et al.’s (2015a) claim to have shown that language patterns among Twitter users, aggregated at the level of US counties, predicted county-level mortality rates from atherosclerotic heart disease (AHD), with “negative” language being associated with higher rates of death from AHD and “positive” language associated with lower rates. First, we examine some of Eichstaedt et al.’s apparent assumptions about the nature of AHD, as well as some issues related to the secondary analysis of online data and to considering counties as communities. Next, using the data files supplied by Eichstaedt et al., we reproduce their regression- and correlation-based models, substituting mortality from an alternative cause of death—namely, suicide—as the outcome variable, and observe that the purported associations between “negative” and “positive” language and mortality are reversed when suicide is used as the outcome variable. We identify numerous other conceptual and methodological limitations that call into question the robustness and generalizability of Eichstaedt et al.’s claims, even when these are based on the results of their ridge regression/machine learning model. We conclude that there is no good evidence that analyzing Twitter data in bulk in this way can add anything useful to our ability to understand geographical variation in AHD mortality rates. PeerJ Inc. 2018-09-21 /pmc/articles/PMC6152451/ /pubmed/30258732 http://dx.doi.org/10.7717/peerj.5656 Text en ©2018 Brown and Coyne http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Epidemiology Brown, Nicholas J.L. Coyne, James C. Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a) |
title | Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a) |
title_full | Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a) |
title_fullStr | Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a) |
title_full_unstemmed | Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a) |
title_short | Does Twitter language reliably predict heart disease? A commentary on Eichstaedt et al. (2015a) |
title_sort | does twitter language reliably predict heart disease? a commentary on eichstaedt et al. (2015a) |
topic | Epidemiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6152451/ https://www.ncbi.nlm.nih.gov/pubmed/30258732 http://dx.doi.org/10.7717/peerj.5656 |
work_keys_str_mv | AT brownnicholasjl doestwitterlanguagereliablypredictheartdiseaseacommentaryoneichstaedtetal2015a AT coynejamesc doestwitterlanguagereliablypredictheartdiseaseacommentaryoneichstaedtetal2015a |