Cargando…

SciNER: Extracting Named Entities from Scientific Literature

The automated extraction of claims from scientific papers via computer is difficult due to the ambiguity and variability inherent in natural language. Even apparently simple tasks, such as isolating reported values for physical quantities (e.g., “the melting point of X is Y”) can be complicated by s...

Descripción completa

Detalles Bibliográficos
Autores principales: Hong, Zhi, Tchoua, Roselyne, Chard, Kyle, Foster, Ian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302801/
http://dx.doi.org/10.1007/978-3-030-50417-5_23
_version_ 1783547923821232128
author Hong, Zhi
Tchoua, Roselyne
Chard, Kyle
Foster, Ian
author_facet Hong, Zhi
Tchoua, Roselyne
Chard, Kyle
Foster, Ian
author_sort Hong, Zhi
collection PubMed
description The automated extraction of claims from scientific papers via computer is difficult due to the ambiguity and variability inherent in natural language. Even apparently simple tasks, such as isolating reported values for physical quantities (e.g., “the melting point of X is Y”) can be complicated by such factors as domain-specific conventions about how named entities (the X in the example) are referenced. Although there are domain-specific toolkits that can handle such complications in certain areas, a generalizable, adaptable model for scientific texts is still lacking. As a first step towards automating this process, we present a generalizable neural network model, SciNER, for recognizing scientific entities in free text. Based on bidirectional LSTM networks, our model combines word embeddings, subword embeddings, and external knowledge (from DBpedia) to boost its accuracy. Experiments show that our model outperforms a leading domain-specific extraction toolkit by up to 50%, as measured by F1 score, while also being easily adapted to new domains.
format Online
Article
Text
id pubmed-7302801
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73028012020-06-19 SciNER: Extracting Named Entities from Scientific Literature Hong, Zhi Tchoua, Roselyne Chard, Kyle Foster, Ian Computational Science – ICCS 2020 Article The automated extraction of claims from scientific papers via computer is difficult due to the ambiguity and variability inherent in natural language. Even apparently simple tasks, such as isolating reported values for physical quantities (e.g., “the melting point of X is Y”) can be complicated by such factors as domain-specific conventions about how named entities (the X in the example) are referenced. Although there are domain-specific toolkits that can handle such complications in certain areas, a generalizable, adaptable model for scientific texts is still lacking. As a first step towards automating this process, we present a generalizable neural network model, SciNER, for recognizing scientific entities in free text. Based on bidirectional LSTM networks, our model combines word embeddings, subword embeddings, and external knowledge (from DBpedia) to boost its accuracy. Experiments show that our model outperforms a leading domain-specific extraction toolkit by up to 50%, as measured by F1 score, while also being easily adapted to new domains. 2020-06-15 /pmc/articles/PMC7302801/ http://dx.doi.org/10.1007/978-3-030-50417-5_23 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Hong, Zhi
Tchoua, Roselyne
Chard, Kyle
Foster, Ian
SciNER: Extracting Named Entities from Scientific Literature
title SciNER: Extracting Named Entities from Scientific Literature
title_full SciNER: Extracting Named Entities from Scientific Literature
title_fullStr SciNER: Extracting Named Entities from Scientific Literature
title_full_unstemmed SciNER: Extracting Named Entities from Scientific Literature
title_short SciNER: Extracting Named Entities from Scientific Literature
title_sort sciner: extracting named entities from scientific literature
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302801/
http://dx.doi.org/10.1007/978-3-030-50417-5_23
work_keys_str_mv AT hongzhi scinerextractingnamedentitiesfromscientificliterature
AT tchouaroselyne scinerextractingnamedentitiesfromscientificliterature
AT chardkyle scinerextractingnamedentitiesfromscientificliterature
AT fosterian scinerextractingnamedentitiesfromscientificliterature