Cargando…

Combining word embeddings to extract chemical and drug entities in biomedical literature

BACKGROUND: Natural language processing (NLP) and text mining technologies for the extraction and indexing of chemical and drug entities are key to improving the access and integration of information from unstructured data such as biomedical literature. METHODS: In this paper we evaluate two importa...

Descripción completa

Detalles Bibliográficos
Autores principales:	López-Úbeda, Pilar, Díaz-Galiano, Manuel Carlos, Ureña-López, L. Alfonso, Martín-Valdivia, M. Teresa
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8684055/ https://www.ncbi.nlm.nih.gov/pubmed/34920708 http://dx.doi.org/10.1186/s12859-021-04188-3

_version_	1784617538871623680
author	López-Úbeda, Pilar Díaz-Galiano, Manuel Carlos Ureña-López, L. Alfonso Martín-Valdivia, M. Teresa
author_facet	López-Úbeda, Pilar Díaz-Galiano, Manuel Carlos Ureña-López, L. Alfonso Martín-Valdivia, M. Teresa
author_sort	López-Úbeda, Pilar
collection	PubMed
description	BACKGROUND: Natural language processing (NLP) and text mining technologies for the extraction and indexing of chemical and drug entities are key to improving the access and integration of information from unstructured data such as biomedical literature. METHODS: In this paper we evaluate two important tasks in NLP: the named entity recognition (NER) and Entity indexing using the SNOMED-CT terminology. For this purpose, we propose a combination of word embeddings in order to improve the results obtained in the PharmaCoNER challenge. RESULTS: For the NER task we present a neural network composed of BiLSTM with a CRF sequential layer where different word embeddings are combined as an input to the architecture. A hybrid method combining supervised and unsupervised models is used for the concept indexing task. In the supervised model, we use the training set to find previously trained concepts, and the unsupervised model is based on a 6-step architecture. This architecture uses a dictionary of synonyms and the Levenshtein distance to assign the correct SNOMED-CT code. CONCLUSION: On the one hand, the combination of word embeddings helps to improve the recognition of chemicals and drugs in the biomedical literature. We achieved results of 91.41% for precision, 90.14% for recall, and 90.77% for F1-score using micro-averaging. On the other hand, our indexing system achieves a 92.67% F1-score, 92.44% for recall, and 92.91% for precision. With these results in a final ranking, we would be in the first position.
format	Online Article Text
id	pubmed-8684055
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-86840552021-12-20 Combining word embeddings to extract chemical and drug entities in biomedical literature López-Úbeda, Pilar Díaz-Galiano, Manuel Carlos Ureña-López, L. Alfonso Martín-Valdivia, M. Teresa BMC Bioinformatics Research BACKGROUND: Natural language processing (NLP) and text mining technologies for the extraction and indexing of chemical and drug entities are key to improving the access and integration of information from unstructured data such as biomedical literature. METHODS: In this paper we evaluate two important tasks in NLP: the named entity recognition (NER) and Entity indexing using the SNOMED-CT terminology. For this purpose, we propose a combination of word embeddings in order to improve the results obtained in the PharmaCoNER challenge. RESULTS: For the NER task we present a neural network composed of BiLSTM with a CRF sequential layer where different word embeddings are combined as an input to the architecture. A hybrid method combining supervised and unsupervised models is used for the concept indexing task. In the supervised model, we use the training set to find previously trained concepts, and the unsupervised model is based on a 6-step architecture. This architecture uses a dictionary of synonyms and the Levenshtein distance to assign the correct SNOMED-CT code. CONCLUSION: On the one hand, the combination of word embeddings helps to improve the recognition of chemicals and drugs in the biomedical literature. We achieved results of 91.41% for precision, 90.14% for recall, and 90.77% for F1-score using micro-averaging. On the other hand, our indexing system achieves a 92.67% F1-score, 92.44% for recall, and 92.91% for precision. With these results in a final ranking, we would be in the first position. BioMed Central 2021-12-17 /pmc/articles/PMC8684055/ /pubmed/34920708 http://dx.doi.org/10.1186/s12859-021-04188-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research López-Úbeda, Pilar Díaz-Galiano, Manuel Carlos Ureña-López, L. Alfonso Martín-Valdivia, M. Teresa Combining word embeddings to extract chemical and drug entities in biomedical literature
title	Combining word embeddings to extract chemical and drug entities in biomedical literature
title_full	Combining word embeddings to extract chemical and drug entities in biomedical literature
title_fullStr	Combining word embeddings to extract chemical and drug entities in biomedical literature
title_full_unstemmed	Combining word embeddings to extract chemical and drug entities in biomedical literature
title_short	Combining word embeddings to extract chemical and drug entities in biomedical literature
title_sort	combining word embeddings to extract chemical and drug entities in biomedical literature
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8684055/ https://www.ncbi.nlm.nih.gov/pubmed/34920708 http://dx.doi.org/10.1186/s12859-021-04188-3
work_keys_str_mv	AT lopezubedapilar combiningwordembeddingstoextractchemicalanddrugentitiesinbiomedicalliterature AT diazgalianomanuelcarlos combiningwordembeddingstoextractchemicalanddrugentitiesinbiomedicalliterature AT urenalopezlalfonso combiningwordembeddingstoextractchemicalanddrugentitiesinbiomedicalliterature AT martinvaldiviamteresa combiningwordembeddingstoextractchemicalanddrugentitiesinbiomedicalliterature

Combining word embeddings to extract chemical and drug entities in biomedical literature

Ejemplares similares