Cargando…
NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding
Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades(1,2), the most dramatic advances in MR have followed in the wake of critical corpus development(3). Large, well-annotated corpora have been associated...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8528865/ https://www.ncbi.nlm.nih.gov/pubmed/34671039 http://dx.doi.org/10.1038/s41540-021-00200-x |
_version_ | 1784586342206799872 |
---|---|
author | Wang, Kanix Stevens, Robert Alachram, Halima Li, Yu Soldatova, Larisa King, Ross Ananiadou, Sophia Schoene, Annika M. Li, Maolin Christopoulou, Fenia Ambite, José Luis Matthew, Joel Garg, Sahil Hermjakob, Ulf Marcu, Daniel Sheng, Emily Beißbarth, Tim Wingender, Edgar Galstyan, Aram Gao, Xin Chambers, Brendan Pan, Weidi Khomtchouk, Bohdan B. Evans, James A. Rzhetsky, Andrey |
author_facet | Wang, Kanix Stevens, Robert Alachram, Halima Li, Yu Soldatova, Larisa King, Ross Ananiadou, Sophia Schoene, Annika M. Li, Maolin Christopoulou, Fenia Ambite, José Luis Matthew, Joel Garg, Sahil Hermjakob, Ulf Marcu, Daniel Sheng, Emily Beißbarth, Tim Wingender, Edgar Galstyan, Aram Gao, Xin Chambers, Brendan Pan, Weidi Khomtchouk, Bohdan B. Evans, James A. Rzhetsky, Andrey |
author_sort | Wang, Kanix |
collection | PubMed |
description | Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades(1,2), the most dramatic advances in MR have followed in the wake of critical corpus development(3). Large, well-annotated corpora have been associated with punctuated advances in MR methodology and automated knowledge extraction systems in the same way that ImageNet(4) was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named entity analysis tool for biomedicine: (a) a new, Named Entity Recognition Ontology (NERO) developed specifically for describing textual entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named entity recognition (NER) automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus. |
format | Online Article Text |
id | pubmed-8528865 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-85288652021-10-22 NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding Wang, Kanix Stevens, Robert Alachram, Halima Li, Yu Soldatova, Larisa King, Ross Ananiadou, Sophia Schoene, Annika M. Li, Maolin Christopoulou, Fenia Ambite, José Luis Matthew, Joel Garg, Sahil Hermjakob, Ulf Marcu, Daniel Sheng, Emily Beißbarth, Tim Wingender, Edgar Galstyan, Aram Gao, Xin Chambers, Brendan Pan, Weidi Khomtchouk, Bohdan B. Evans, James A. Rzhetsky, Andrey NPJ Syst Biol Appl Article Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades(1,2), the most dramatic advances in MR have followed in the wake of critical corpus development(3). Large, well-annotated corpora have been associated with punctuated advances in MR methodology and automated knowledge extraction systems in the same way that ImageNet(4) was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named entity analysis tool for biomedicine: (a) a new, Named Entity Recognition Ontology (NERO) developed specifically for describing textual entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named entity recognition (NER) automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus. Nature Publishing Group UK 2021-10-20 /pmc/articles/PMC8528865/ /pubmed/34671039 http://dx.doi.org/10.1038/s41540-021-00200-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Wang, Kanix Stevens, Robert Alachram, Halima Li, Yu Soldatova, Larisa King, Ross Ananiadou, Sophia Schoene, Annika M. Li, Maolin Christopoulou, Fenia Ambite, José Luis Matthew, Joel Garg, Sahil Hermjakob, Ulf Marcu, Daniel Sheng, Emily Beißbarth, Tim Wingender, Edgar Galstyan, Aram Gao, Xin Chambers, Brendan Pan, Weidi Khomtchouk, Bohdan B. Evans, James A. Rzhetsky, Andrey NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding |
title | NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding |
title_full | NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding |
title_fullStr | NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding |
title_full_unstemmed | NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding |
title_short | NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding |
title_sort | nero: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8528865/ https://www.ncbi.nlm.nih.gov/pubmed/34671039 http://dx.doi.org/10.1038/s41540-021-00200-x |
work_keys_str_mv | AT wangkanix neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT stevensrobert neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT alachramhalima neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT liyu neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT soldatovalarisa neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT kingross neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT ananiadousophia neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT schoeneannikam neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT limaolin neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT christopouloufenia neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT ambitejoseluis neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT matthewjoel neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT gargsahil neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT hermjakobulf neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT marcudaniel neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT shengemily neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT beißbarthtim neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT wingenderedgar neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT galstyanaram neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT gaoxin neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT chambersbrendan neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT panweidi neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT khomtchoukbohdanb neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT evansjamesa neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding AT rzhetskyandrey neroabiomedicalnamedentityrecognitionontologywithalargeannotatedcorpusrevealsmeaningfulassociationsthroughtextembedding |