Cargando…

Deep learning with word embeddings improves biomedical named entity recognition

MOTIVATION: Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface...

Descripción completa

Detalles Bibliográficos
Autores principales:	Habibi, Maryam, Weber, Leon, Neves, Mariana, Wiegandt, David Luis, Leser, Ulf
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2017
Materias:	Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870729/ https://www.ncbi.nlm.nih.gov/pubmed/28881963 http://dx.doi.org/10.1093/bioinformatics/btx228

_version_	1783309541932269568
author	Habibi, Maryam Weber, Leon Neves, Mariana Wiegandt, David Luis Leser, Ulf
author_facet	Habibi, Maryam Weber, Leon Neves, Mariana Wiegandt, David Luis Leser, Ulf
author_sort	Habibi, Maryam
collection	PubMed
description	MOTIVATION: Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. RESULTS: We show that a completely generic method based on deep learning and statistical word embeddings [called long short-term memory network-conditional random field (LSTM-CRF)] outperforms state-of-the-art entity-specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM-CRF on 33 data sets covering five different entity classes with that of best-of-class NER tools and an entity-agnostic CRF implementation. On average, F1-score of LSTM-CRF is 5% above that of the baselines, mostly due to a sharp increase in recall. AVAILABILITY AND IMPLEMENTATION: The source code for LSTM-CRF is available at https://github.com/glample/tagger and the links to the corpora are available at https://corposaurus.github.io/corpora/.
format	Online Article Text
id	pubmed-5870729
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-58707292018-04-05 Deep learning with word embeddings improves biomedical named entity recognition Habibi, Maryam Weber, Leon Neves, Mariana Wiegandt, David Luis Leser, Ulf Bioinformatics Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 MOTIVATION: Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. RESULTS: We show that a completely generic method based on deep learning and statistical word embeddings [called long short-term memory network-conditional random field (LSTM-CRF)] outperforms state-of-the-art entity-specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM-CRF on 33 data sets covering five different entity classes with that of best-of-class NER tools and an entity-agnostic CRF implementation. On average, F1-score of LSTM-CRF is 5% above that of the baselines, mostly due to a sharp increase in recall. AVAILABILITY AND IMPLEMENTATION: The source code for LSTM-CRF is available at https://github.com/glample/tagger and the links to the corpora are available at https://corposaurus.github.io/corpora/. Oxford University Press 2017-07-15 2017-07-12 /pmc/articles/PMC5870729/ /pubmed/28881963 http://dx.doi.org/10.1093/bioinformatics/btx228 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 Habibi, Maryam Weber, Leon Neves, Mariana Wiegandt, David Luis Leser, Ulf Deep learning with word embeddings improves biomedical named entity recognition
title	Deep learning with word embeddings improves biomedical named entity recognition
title_full	Deep learning with word embeddings improves biomedical named entity recognition
title_fullStr	Deep learning with word embeddings improves biomedical named entity recognition
title_full_unstemmed	Deep learning with word embeddings improves biomedical named entity recognition
title_short	Deep learning with word embeddings improves biomedical named entity recognition
title_sort	deep learning with word embeddings improves biomedical named entity recognition
topic	Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870729/ https://www.ncbi.nlm.nih.gov/pubmed/28881963 http://dx.doi.org/10.1093/bioinformatics/btx228
work_keys_str_mv	AT habibimaryam deeplearningwithwordembeddingsimprovesbiomedicalnamedentityrecognition AT weberleon deeplearningwithwordembeddingsimprovesbiomedicalnamedentityrecognition AT nevesmariana deeplearningwithwordembeddingsimprovesbiomedicalnamedentityrecognition AT wiegandtdavidluis deeplearningwithwordembeddingsimprovesbiomedicalnamedentityrecognition AT leserulf deeplearningwithwordembeddingsimprovesbiomedicalnamedentityrecognition

Deep learning with word embeddings improves biomedical named entity recognition

Ejemplares similares