Cargando…

Simplifying drug package leaflets written in Spanish by using word embedding

BACKGROUND: Drug Package Leaflets (DPLs) provide information for patients on how to safely use medicines. Pharmaceutical companies are responsible for producing these documents. However, several studies have shown that patients usually have problems in understanding sections describing posology (dos...

Descripción completa

Detalles Bibliográficos
Autores principales: Segura-Bedmar, Isabel, Martínez, Paloma
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5622567/
https://www.ncbi.nlm.nih.gov/pubmed/28962645
http://dx.doi.org/10.1186/s13326-017-0156-7
_version_ 1783267937050689536
author Segura-Bedmar, Isabel
Martínez, Paloma
author_facet Segura-Bedmar, Isabel
Martínez, Paloma
author_sort Segura-Bedmar, Isabel
collection PubMed
description BACKGROUND: Drug Package Leaflets (DPLs) provide information for patients on how to safely use medicines. Pharmaceutical companies are responsible for producing these documents. However, several studies have shown that patients usually have problems in understanding sections describing posology (dosage quantity and prescription), contraindications and adverse drug reactions. An ultimate goal of this work is to provide an automatic approach that helps these companies to write drug package leaflets in an easy-to-understand language. Natural language processing has become a powerful tool for improving patient care and advancing medicine because it leads to automatically process the large amount of unstructured information needed for patient care. However, to the best of our knowledge, no research has been done on the automatic simplification of drug package leaflets. In a previous work, we proposed to use domain terminological resources for gathering a set of synonyms for a given target term. A potential drawback of this approach is that it depends heavily on the existence of dictionaries, however these are not always available for any domain and language or if they exist, their coverage is very scarce. To overcome this limitation, we propose the use of word embeddings to identify the simplest synonym for a given term. Word embedding models represent each word in a corpus with a vector in a semantic space. Our approach is based on assumption that synonyms should have close vectors because they occur in similar contexts. RESULTS: In our evaluation, we used the corpus EasyDPL (Easy Drug Package Leaflets), a collection of 306 leaflets written in Spanish and manually annotated with 1400 adverse drug effects and their simplest synonyms. We focus on leaflets written in Spanish because it is the second most widely spoken language on the world, but as for the existence of terminological resources, the Spanish language is usually less prolific than the English language. Our experiments show an accuracy of 38.5% using word embeddings. CONCLUSIONS: This work provides a promising approach to simplify DPLs without using terminological resources or parallel corpora. Moreover, it could be easily adapted to different domains and languages. However, more research efforts are needed to improve our approach based on word embedding because it does not overcome our previous work using dictionaries yet.
format Online
Article
Text
id pubmed-5622567
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56225672017-10-12 Simplifying drug package leaflets written in Spanish by using word embedding Segura-Bedmar, Isabel Martínez, Paloma J Biomed Semantics Research BACKGROUND: Drug Package Leaflets (DPLs) provide information for patients on how to safely use medicines. Pharmaceutical companies are responsible for producing these documents. However, several studies have shown that patients usually have problems in understanding sections describing posology (dosage quantity and prescription), contraindications and adverse drug reactions. An ultimate goal of this work is to provide an automatic approach that helps these companies to write drug package leaflets in an easy-to-understand language. Natural language processing has become a powerful tool for improving patient care and advancing medicine because it leads to automatically process the large amount of unstructured information needed for patient care. However, to the best of our knowledge, no research has been done on the automatic simplification of drug package leaflets. In a previous work, we proposed to use domain terminological resources for gathering a set of synonyms for a given target term. A potential drawback of this approach is that it depends heavily on the existence of dictionaries, however these are not always available for any domain and language or if they exist, their coverage is very scarce. To overcome this limitation, we propose the use of word embeddings to identify the simplest synonym for a given term. Word embedding models represent each word in a corpus with a vector in a semantic space. Our approach is based on assumption that synonyms should have close vectors because they occur in similar contexts. RESULTS: In our evaluation, we used the corpus EasyDPL (Easy Drug Package Leaflets), a collection of 306 leaflets written in Spanish and manually annotated with 1400 adverse drug effects and their simplest synonyms. We focus on leaflets written in Spanish because it is the second most widely spoken language on the world, but as for the existence of terminological resources, the Spanish language is usually less prolific than the English language. Our experiments show an accuracy of 38.5% using word embeddings. CONCLUSIONS: This work provides a promising approach to simplify DPLs without using terminological resources or parallel corpora. Moreover, it could be easily adapted to different domains and languages. However, more research efforts are needed to improve our approach based on word embedding because it does not overcome our previous work using dictionaries yet. BioMed Central 2017-09-29 /pmc/articles/PMC5622567/ /pubmed/28962645 http://dx.doi.org/10.1186/s13326-017-0156-7 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Segura-Bedmar, Isabel
Martínez, Paloma
Simplifying drug package leaflets written in Spanish by using word embedding
title Simplifying drug package leaflets written in Spanish by using word embedding
title_full Simplifying drug package leaflets written in Spanish by using word embedding
title_fullStr Simplifying drug package leaflets written in Spanish by using word embedding
title_full_unstemmed Simplifying drug package leaflets written in Spanish by using word embedding
title_short Simplifying drug package leaflets written in Spanish by using word embedding
title_sort simplifying drug package leaflets written in spanish by using word embedding
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5622567/
https://www.ncbi.nlm.nih.gov/pubmed/28962645
http://dx.doi.org/10.1186/s13326-017-0156-7
work_keys_str_mv AT segurabedmarisabel simplifyingdrugpackageleafletswritteninspanishbyusingwordembedding
AT martinezpaloma simplifyingdrugpackageleafletswritteninspanishbyusingwordembedding