Cargando…

Using publicly visible social media to build detailed forecasts of civil unrest

We demonstrate how one can generate predictions for several thousand incidents of Latin American civil unrest, often many days in advance, by surfacing informative public posts available on Twitter and Tumblr. The data mining system presented here runs daily and requires no manual intervention. Iden...

Descripción completa

Detalles Bibliográficos
Autores principales: Compton, Ryan, Lee, Craig, Xu, Jiejun, Artieda-Moncada, Luis, Lu, Tsai-Ching, Silva, Lalindra De, Macy, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4643851/
https://www.ncbi.nlm.nih.gov/pubmed/26594609
http://dx.doi.org/10.1186/s13388-014-0004-6
_version_ 1782400575724322816
author Compton, Ryan
Lee, Craig
Xu, Jiejun
Artieda-Moncada, Luis
Lu, Tsai-Ching
Silva, Lalindra De
Macy, Michael
author_facet Compton, Ryan
Lee, Craig
Xu, Jiejun
Artieda-Moncada, Luis
Lu, Tsai-Ching
Silva, Lalindra De
Macy, Michael
author_sort Compton, Ryan
collection PubMed
description We demonstrate how one can generate predictions for several thousand incidents of Latin American civil unrest, often many days in advance, by surfacing informative public posts available on Twitter and Tumblr. The data mining system presented here runs daily and requires no manual intervention. Identification of informative posts is accomplished by applying multiple textual and geographic filters to a high-volume data feed consisting of tens of millions of posts per day which have been flagged as public by their authors. Predictions are built by annotating the filtered posts, typically a few dozen per day, with demographic, spatial, and temporal information. Key to our textual filters is the fact that social media posts are necessarily short, making it possible to easily infer topic by simply searching for comentions of typically unrelated terms within the same post (e.g. a future date comentioned with an unrest keyword). Additional textual filters then proceed by applying a logistic regression classifier trained to recognize accounts belonging to organizations who are likely to announce civil unrest. Geographic filtering is accomplished despite sparsely available GPS information and without relying on sophisticated natural language processing. A geocoding technique which infers non-GPS-known user locations via the locations of their GPS-known friends provides us with location estimates for 91,984,163 Twitter users at a median error of 6.65km. We show that announcements of upcoming events tend to localize within a small geographic region, allowing us to forecast event locations which are not explicitly mentioned in text. We annotate our forecasts with demographic information by searching the collected posts for demographic specific keywords generated by hand as well as with the aid of DBpedia. Our system has been in production since December 2012 and, at the time of this writing, has produced 4,771 distinct forecasts for events across ten Latin American nations. Manual examination of 2,859 posts surfaced by our method revealed that only 108 were discussing topics unrelated to civil unrest. Examination of 2,596 forecasts generated between 2013-07-01 and 2013-11-30 found 1,192 (45.9%) matched exactly the date and within a 100 km radius of a civil unrest event reported in traditional news media. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13388-014-0004-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4643851
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-46438512015-11-19 Using publicly visible social media to build detailed forecasts of civil unrest Compton, Ryan Lee, Craig Xu, Jiejun Artieda-Moncada, Luis Lu, Tsai-Ching Silva, Lalindra De Macy, Michael Secur Inform Research We demonstrate how one can generate predictions for several thousand incidents of Latin American civil unrest, often many days in advance, by surfacing informative public posts available on Twitter and Tumblr. The data mining system presented here runs daily and requires no manual intervention. Identification of informative posts is accomplished by applying multiple textual and geographic filters to a high-volume data feed consisting of tens of millions of posts per day which have been flagged as public by their authors. Predictions are built by annotating the filtered posts, typically a few dozen per day, with demographic, spatial, and temporal information. Key to our textual filters is the fact that social media posts are necessarily short, making it possible to easily infer topic by simply searching for comentions of typically unrelated terms within the same post (e.g. a future date comentioned with an unrest keyword). Additional textual filters then proceed by applying a logistic regression classifier trained to recognize accounts belonging to organizations who are likely to announce civil unrest. Geographic filtering is accomplished despite sparsely available GPS information and without relying on sophisticated natural language processing. A geocoding technique which infers non-GPS-known user locations via the locations of their GPS-known friends provides us with location estimates for 91,984,163 Twitter users at a median error of 6.65km. We show that announcements of upcoming events tend to localize within a small geographic region, allowing us to forecast event locations which are not explicitly mentioned in text. We annotate our forecasts with demographic information by searching the collected posts for demographic specific keywords generated by hand as well as with the aid of DBpedia. Our system has been in production since December 2012 and, at the time of this writing, has produced 4,771 distinct forecasts for events across ten Latin American nations. Manual examination of 2,859 posts surfaced by our method revealed that only 108 were discussing topics unrelated to civil unrest. Examination of 2,596 forecasts generated between 2013-07-01 and 2013-11-30 found 1,192 (45.9%) matched exactly the date and within a 100 km radius of a civil unrest event reported in traditional news media. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13388-014-0004-6) contains supplementary material, which is available to authorized users. Springer Berlin Heidelberg 2014-09-03 2014 /pmc/articles/PMC4643851/ /pubmed/26594609 http://dx.doi.org/10.1186/s13388-014-0004-6 Text en © Compton et al.; licensee Springer 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Research
Compton, Ryan
Lee, Craig
Xu, Jiejun
Artieda-Moncada, Luis
Lu, Tsai-Ching
Silva, Lalindra De
Macy, Michael
Using publicly visible social media to build detailed forecasts of civil unrest
title Using publicly visible social media to build detailed forecasts of civil unrest
title_full Using publicly visible social media to build detailed forecasts of civil unrest
title_fullStr Using publicly visible social media to build detailed forecasts of civil unrest
title_full_unstemmed Using publicly visible social media to build detailed forecasts of civil unrest
title_short Using publicly visible social media to build detailed forecasts of civil unrest
title_sort using publicly visible social media to build detailed forecasts of civil unrest
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4643851/
https://www.ncbi.nlm.nih.gov/pubmed/26594609
http://dx.doi.org/10.1186/s13388-014-0004-6
work_keys_str_mv AT comptonryan usingpubliclyvisiblesocialmediatobuilddetailedforecastsofcivilunrest
AT leecraig usingpubliclyvisiblesocialmediatobuilddetailedforecastsofcivilunrest
AT xujiejun usingpubliclyvisiblesocialmediatobuilddetailedforecastsofcivilunrest
AT artiedamoncadaluis usingpubliclyvisiblesocialmediatobuilddetailedforecastsofcivilunrest
AT lutsaiching usingpubliclyvisiblesocialmediatobuilddetailedforecastsofcivilunrest
AT silvalalindrade usingpubliclyvisiblesocialmediatobuilddetailedforecastsofcivilunrest
AT macymichael usingpubliclyvisiblesocialmediatobuilddetailedforecastsofcivilunrest