Cargando…

Towards scaling Twitter for digital epidemiology of birth defects

Social media has recently been used to identify and study a small cohort of Twitter users whose pregnancies with birth defect outcomes—the leading cause of infant mortality—could be observed via their publicly available tweets. In this study, we exploit social media on a larger scale by developing n...

Descripción completa

Detalles Bibliográficos
Autores principales:	Klein, Ari Z., Sarker, Abeed, Weissenbacher, Davy, Gonzalez-Hernandez, Graciela
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6773753/ https://www.ncbi.nlm.nih.gov/pubmed/31583284 http://dx.doi.org/10.1038/s41746-019-0170-5

_version_	1783455945110585344
author	Klein, Ari Z. Sarker, Abeed Weissenbacher, Davy Gonzalez-Hernandez, Graciela
author_facet	Klein, Ari Z. Sarker, Abeed Weissenbacher, Davy Gonzalez-Hernandez, Graciela
author_sort	Klein, Ari Z.
collection	PubMed
description	Social media has recently been used to identify and study a small cohort of Twitter users whose pregnancies with birth defect outcomes—the leading cause of infant mortality—could be observed via their publicly available tweets. In this study, we exploit social media on a larger scale by developing natural language processing (NLP) methods to automatically detect, among thousands of users, a cohort of mothers reporting that their child has a birth defect. We used 22,999 annotated tweets to train and evaluate supervised machine learning algorithms—feature-engineered and deep learning-based classifiers—that automatically distinguish tweets referring to the user’s pregnancy outcome from tweets that merely mention birth defects. Because 90% of the tweets merely mention birth defects, we experimented with under-sampling and over-sampling approaches to address this class imbalance. An SVM classifier achieved the best performance for the two positive classes: an F(1)-score of 0.65 for the “defect” class and 0.51 for the “possible defect” class. We deployed the classifier on 20,457 unlabeled tweets that mention birth defects, which helped identify 542 additional users for potential inclusion in our cohort. Contributions of this study include (1) NLP methods for automatically detecting tweets by users reporting their birth defect outcomes, (2) findings that an SVM classifier can outperform a deep neural network-based classifier for highly imbalanced social media data, (3) evidence that automatic classification can be used to identify additional users for potential inclusion in our cohort, and (4) a publicly available corpus for training and evaluating supervised machine learning algorithms.
format	Online Article Text
id	pubmed-6773753
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-67737532019-10-03 Towards scaling Twitter for digital epidemiology of birth defects Klein, Ari Z. Sarker, Abeed Weissenbacher, Davy Gonzalez-Hernandez, Graciela NPJ Digit Med Article Social media has recently been used to identify and study a small cohort of Twitter users whose pregnancies with birth defect outcomes—the leading cause of infant mortality—could be observed via their publicly available tweets. In this study, we exploit social media on a larger scale by developing natural language processing (NLP) methods to automatically detect, among thousands of users, a cohort of mothers reporting that their child has a birth defect. We used 22,999 annotated tweets to train and evaluate supervised machine learning algorithms—feature-engineered and deep learning-based classifiers—that automatically distinguish tweets referring to the user’s pregnancy outcome from tweets that merely mention birth defects. Because 90% of the tweets merely mention birth defects, we experimented with under-sampling and over-sampling approaches to address this class imbalance. An SVM classifier achieved the best performance for the two positive classes: an F(1)-score of 0.65 for the “defect” class and 0.51 for the “possible defect” class. We deployed the classifier on 20,457 unlabeled tweets that mention birth defects, which helped identify 542 additional users for potential inclusion in our cohort. Contributions of this study include (1) NLP methods for automatically detecting tweets by users reporting their birth defect outcomes, (2) findings that an SVM classifier can outperform a deep neural network-based classifier for highly imbalanced social media data, (3) evidence that automatic classification can be used to identify additional users for potential inclusion in our cohort, and (4) a publicly available corpus for training and evaluating supervised machine learning algorithms. Nature Publishing Group UK 2019-10-01 /pmc/articles/PMC6773753/ /pubmed/31583284 http://dx.doi.org/10.1038/s41746-019-0170-5 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Article Klein, Ari Z. Sarker, Abeed Weissenbacher, Davy Gonzalez-Hernandez, Graciela Towards scaling Twitter for digital epidemiology of birth defects
title	Towards scaling Twitter for digital epidemiology of birth defects
title_full	Towards scaling Twitter for digital epidemiology of birth defects
title_fullStr	Towards scaling Twitter for digital epidemiology of birth defects
title_full_unstemmed	Towards scaling Twitter for digital epidemiology of birth defects
title_short	Towards scaling Twitter for digital epidemiology of birth defects
title_sort	towards scaling twitter for digital epidemiology of birth defects
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6773753/ https://www.ncbi.nlm.nih.gov/pubmed/31583284 http://dx.doi.org/10.1038/s41746-019-0170-5
work_keys_str_mv	AT kleinariz towardsscalingtwitterfordigitalepidemiologyofbirthdefects AT sarkerabeed towardsscalingtwitterfordigitalepidemiologyofbirthdefects AT weissenbacherdavy towardsscalingtwitterfordigitalepidemiologyofbirthdefects AT gonzalezhernandezgraciela towardsscalingtwitterfordigitalepidemiologyofbirthdefects

Towards scaling Twitter for digital epidemiology of birth defects

Ejemplares similares