Cargando…

TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations

BACKGROUND: Work on pharmacovigilance systems using texts from PubMed and Twitter typically target at different elements and use different annotation guidelines resulting in a scenario where there is no comparable set of documents from both Twitter and PubMed annotated in the same manner. OBJECTIVE:...

Descripción completa

Detalles Bibliográficos
Autores principales: Alvaro, Nestor, Miyao, Yusuke, Collier, Nigel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5438461/
https://www.ncbi.nlm.nih.gov/pubmed/28468748
http://dx.doi.org/10.2196/publichealth.6396
_version_ 1783237767339180032
author Alvaro, Nestor
Miyao, Yusuke
Collier, Nigel
author_facet Alvaro, Nestor
Miyao, Yusuke
Collier, Nigel
author_sort Alvaro, Nestor
collection PubMed
description BACKGROUND: Work on pharmacovigilance systems using texts from PubMed and Twitter typically target at different elements and use different annotation guidelines resulting in a scenario where there is no comparable set of documents from both Twitter and PubMed annotated in the same manner. OBJECTIVE: This study aimed to provide a comparable corpus of texts from PubMed and Twitter that can be used to study drug reports from these two sources of information, allowing researchers in the area of pharmacovigilance using natural language processing (NLP) to perform experiments to better understand the similarities and differences between drug reports in Twitter and PubMed. METHODS: We produced a corpus comprising 1000 tweets and 1000 PubMed sentences selected using the same strategy and annotated at entity level by the same experts (pharmacists) using the same set of guidelines. RESULTS: The resulting corpus, annotated by two pharmacists, comprises semantically correct annotations for a set of drugs, diseases, and symptoms. This corpus contains the annotations for 3144 entities, 2749 relations, and 5003 attributes. CONCLUSIONS: We present a corpus that is unique in its characteristics as this is the first corpus for pharmacovigilance curated from Twitter messages and PubMed sentences using the same data selection and annotation strategies. We believe this corpus will be of particular interest for researchers willing to compare results from pharmacovigilance systems (eg, classifiers and named entity recognition systems) when using data from Twitter and from PubMed. We hope that given the comprehensive set of drug names and the annotated entities and relations, this corpus becomes a standard resource to compare results from different pharmacovigilance studies in the area of NLP.
format Online
Article
Text
id pubmed-5438461
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-54384612017-06-06 TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations Alvaro, Nestor Miyao, Yusuke Collier, Nigel JMIR Public Health Surveill Original Paper BACKGROUND: Work on pharmacovigilance systems using texts from PubMed and Twitter typically target at different elements and use different annotation guidelines resulting in a scenario where there is no comparable set of documents from both Twitter and PubMed annotated in the same manner. OBJECTIVE: This study aimed to provide a comparable corpus of texts from PubMed and Twitter that can be used to study drug reports from these two sources of information, allowing researchers in the area of pharmacovigilance using natural language processing (NLP) to perform experiments to better understand the similarities and differences between drug reports in Twitter and PubMed. METHODS: We produced a corpus comprising 1000 tweets and 1000 PubMed sentences selected using the same strategy and annotated at entity level by the same experts (pharmacists) using the same set of guidelines. RESULTS: The resulting corpus, annotated by two pharmacists, comprises semantically correct annotations for a set of drugs, diseases, and symptoms. This corpus contains the annotations for 3144 entities, 2749 relations, and 5003 attributes. CONCLUSIONS: We present a corpus that is unique in its characteristics as this is the first corpus for pharmacovigilance curated from Twitter messages and PubMed sentences using the same data selection and annotation strategies. We believe this corpus will be of particular interest for researchers willing to compare results from pharmacovigilance systems (eg, classifiers and named entity recognition systems) when using data from Twitter and from PubMed. We hope that given the comprehensive set of drug names and the annotated entities and relations, this corpus becomes a standard resource to compare results from different pharmacovigilance studies in the area of NLP. JMIR Publications 2017-05-03 /pmc/articles/PMC5438461/ /pubmed/28468748 http://dx.doi.org/10.2196/publichealth.6396 Text en ©Nestor Alvaro, Yusuke Miyao, Nigel Collier. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 03.05.2017. http://creativecommons.org/licenses/by/2.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Alvaro, Nestor
Miyao, Yusuke
Collier, Nigel
TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations
title TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations
title_full TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations
title_fullStr TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations
title_full_unstemmed TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations
title_short TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations
title_sort twimed: twitter and pubmed comparable corpus of drugs, diseases, symptoms, and their relations
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5438461/
https://www.ncbi.nlm.nih.gov/pubmed/28468748
http://dx.doi.org/10.2196/publichealth.6396
work_keys_str_mv AT alvaronestor twimedtwitterandpubmedcomparablecorpusofdrugsdiseasessymptomsandtheirrelations
AT miyaoyusuke twimedtwitterandpubmedcomparablecorpusofdrugsdiseasessymptomsandtheirrelations
AT colliernigel twimedtwitterandpubmedcomparablecorpusofdrugsdiseasessymptomsandtheirrelations