Cargando…

NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding

Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades(1,2), the most dramatic advances in MR have followed in the wake of critical corpus development(3). Large, well-annotated corpora have been associated...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Kanix, Stevens, Robert, Alachram, Halima, Li, Yu, Soldatova, Larisa, King, Ross, Ananiadou, Sophia, Schoene, Annika M., Li, Maolin, Christopoulou, Fenia, Ambite, José Luis, Matthew, Joel, Garg, Sahil, Hermjakob, Ulf, Marcu, Daniel, Sheng, Emily, Beißbarth, Tim, Wingender, Edgar, Galstyan, Aram, Gao, Xin, Chambers, Brendan, Pan, Weidi, Khomtchouk, Bohdan B., Evans, James A., Rzhetsky, Andrey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8528865/
https://www.ncbi.nlm.nih.gov/pubmed/34671039
http://dx.doi.org/10.1038/s41540-021-00200-x
_version_ 1784586342206799872
author Wang, Kanix
Stevens, Robert
Alachram, Halima
Li, Yu
Soldatova, Larisa
King, Ross
Ananiadou, Sophia
Schoene, Annika M.
Li, Maolin
Christopoulou, Fenia
Ambite, José Luis
Matthew, Joel
Garg, Sahil
Hermjakob, Ulf
Marcu, Daniel
Sheng, Emily
Beißbarth, Tim
Wingender, Edgar
Galstyan, Aram
Gao, Xin
Chambers, Brendan
Pan, Weidi
Khomtchouk, Bohdan B.
Evans, James A.
Rzhetsky, Andrey
author_facet Wang, Kanix
Stevens, Robert
Alachram, Halima
Li, Yu
Soldatova, Larisa
King, Ross
Ananiadou, Sophia
Schoene, Annika M.
Li, Maolin
Christopoulou, Fenia
Ambite, José Luis
Matthew, Joel
Garg, Sahil
Hermjakob, Ulf
Marcu, Daniel
Sheng, Emily
Beißbarth, Tim
Wingender, Edgar
Galstyan, Aram
Gao, Xin
Chambers, Brendan
Pan, Weidi
Khomtchouk, Bohdan B.
Evans, James A.
Rzhetsky, Andrey
author_sort Wang, Kanix
collection PubMed
description Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades(1,2), the most dramatic advances in MR have followed in the wake of critical corpus development(3). Large, well-annotated corpora have been associated with punctuated advances in MR methodology and automated knowledge extraction systems in the same way that ImageNet(4) was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named entity analysis tool for biomedicine: (a) a new, Named Entity Recognition Ontology (NERO) developed specifically for describing textual entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named entity recognition (NER) automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus.
format Online
Article
Text
id pubmed-8528865
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-85288652021-10-22 NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding Wang, Kanix Stevens, Robert Alachram, Halima Li, Yu Soldatova, Larisa King, Ross Ananiadou, Sophia Schoene, Annika M. Li, Maolin Christopoulou, Fenia Ambite, José Luis Matthew, Joel Garg, Sahil Hermjakob, Ulf Marcu, Daniel Sheng, Emily Beißbarth, Tim Wingender, Edgar Galstyan, Aram Gao, Xin Chambers, Brendan Pan, Weidi Khomtchouk, Bohdan B. Evans, James A. Rzhetsky, Andrey NPJ Syst Biol Appl Article Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades(1,2), the most dramatic advances in MR have followed in the wake of critical corpus development(3). Large, well-annotated corpora have been associated with punctuated advances in MR methodology and automated knowledge extraction systems in the same way that ImageNet(4) was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named entity analysis tool for biomedicine: (a) a new, Named Entity Recognition Ontology (NERO) developed specifically for describing textual entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named entity recognition (NER) automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus. Nature Publishing Group UK 2021-10-20 /pmc/articles/PMC8528865/ /pubmed/34671039 http://dx.doi.org/10.1038/s41540-021-00200-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Wang, Kanix
Stevens, Robert
Alachram, Halima
Li, Yu
Soldatova, Larisa
King, Ross
Ananiadou, Sophia
Schoene, Annika M.
Li, Maolin
Christopoulou, Fenia
Ambite, José Luis
Matthew, Joel
Garg, Sahil
Hermjakob, Ulf
Marcu, Daniel
Sheng, Emily
Beißbarth, Tim
Wingender, Edgar
Galstyan, Aram
Gao, Xin
Chambers, Brendan
Pan, Weidi
Khomtchouk, Bohdan B.
Evans, James A.
Rzhetsky, Andrey
NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding
title NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding
title_full NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding
title_fullStr NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding
title_full_unstemmed NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding
title_short NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding
title_sort nero: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8528865/
https://www.ncbi.nlm.nih.gov/pubmed/34671039
http://dx.doi.org/10.1038/s41540-021-00200-x
work_keys_str_mv AT wangkanix neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT stevensrobert neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT alachramhalima neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT liyu neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT soldatovalarisa neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT kingross neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT ananiadousophia neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT schoeneannikam neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT limaolin neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT christopouloufenia neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT ambitejoseluis neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT matthewjoel neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT gargsahil neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT hermjakobulf neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT marcudaniel neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT shengemily neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT beißbarthtim neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT wingenderedgar neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT galstyanaram neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT gaoxin neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT chambersbrendan neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT panweidi neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT khomtchoukbohdanb neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT evansjamesa neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding
AT rzhetskyandrey neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding