Cargando…

Gold-standard ontology-based anatomical annotation in the CRAFT Corpus

Gold-standard annotated corpora have become important resources for the training and testing of natural-language-processing (NLP) systems designed to support biocuration efforts, and ontologies are increasingly used to facilitate curational consistency and semantic integration across disparate resou...

Descripción completa

Detalles Bibliográficos
Autores principales: Bada, Michael, Vasilevsky, Nicole, Baumgartner, William A, Haendel, Melissa, Hunter, Lawrence E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7243923/
https://www.ncbi.nlm.nih.gov/pubmed/31725864
http://dx.doi.org/10.1093/database/bax087
_version_ 1783537488304799744
author Bada, Michael
Vasilevsky, Nicole
Baumgartner, William A
Haendel, Melissa
Hunter, Lawrence E
author_facet Bada, Michael
Vasilevsky, Nicole
Baumgartner, William A
Haendel, Melissa
Hunter, Lawrence E
author_sort Bada, Michael
collection PubMed
description Gold-standard annotated corpora have become important resources for the training and testing of natural-language-processing (NLP) systems designed to support biocuration efforts, and ontologies are increasingly used to facilitate curational consistency and semantic integration across disparate resources. Bringing together the respective power of these, the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of full-length, open-access biomedical journal articles with extensive manually created syntactic, formatting and semantic markup, was previously created and released. This initial public release has already been used in multiple projects to drive development of systems focused on a variety of biocuration, search, visualization, and semantic and syntactic NLP tasks. Building on its demonstrated utility, we have expanded the CRAFT Corpus with a large set of manually created semantic annotations relying on Uberon, an ontology representing anatomical entities and life-cycle stages of multicellular organisms across species as well as types of multicellular organisms defined in terms of life-cycle stage and sexual characteristics. This newly created set of annotations, which has been added for v2.1 of the corpus, is by far the largest publicly available collection of gold-standard anatomical markup and is the first large-scale effort at manual markup of biomedical text relying on the entirety of an anatomical terminology, as opposed to annotation with a small number of high-level anatomical categories, as performed in previous corpora. In addition to presenting and discussing this newly available resource, we apply it to provide a performance baseline for the automatic annotation of anatomical concepts in biomedical text using a prominent concept recognition system. The full corpus, released with a CC BY 3.0 license, may be downloaded from http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml. Database URL: http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml
format Online
Article
Text
id pubmed-7243923
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-72439232020-05-27 Gold-standard ontology-based anatomical annotation in the CRAFT Corpus Bada, Michael Vasilevsky, Nicole Baumgartner, William A Haendel, Melissa Hunter, Lawrence E Database (Oxford) Original Article Gold-standard annotated corpora have become important resources for the training and testing of natural-language-processing (NLP) systems designed to support biocuration efforts, and ontologies are increasingly used to facilitate curational consistency and semantic integration across disparate resources. Bringing together the respective power of these, the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of full-length, open-access biomedical journal articles with extensive manually created syntactic, formatting and semantic markup, was previously created and released. This initial public release has already been used in multiple projects to drive development of systems focused on a variety of biocuration, search, visualization, and semantic and syntactic NLP tasks. Building on its demonstrated utility, we have expanded the CRAFT Corpus with a large set of manually created semantic annotations relying on Uberon, an ontology representing anatomical entities and life-cycle stages of multicellular organisms across species as well as types of multicellular organisms defined in terms of life-cycle stage and sexual characteristics. This newly created set of annotations, which has been added for v2.1 of the corpus, is by far the largest publicly available collection of gold-standard anatomical markup and is the first large-scale effort at manual markup of biomedical text relying on the entirety of an anatomical terminology, as opposed to annotation with a small number of high-level anatomical categories, as performed in previous corpora. In addition to presenting and discussing this newly available resource, we apply it to provide a performance baseline for the automatic annotation of anatomical concepts in biomedical text using a prominent concept recognition system. The full corpus, released with a CC BY 3.0 license, may be downloaded from http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml. Database URL: http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml Oxford University Press 2017-12-27 /pmc/articles/PMC7243923/ /pubmed/31725864 http://dx.doi.org/10.1093/database/bax087 Text en © The Author(s) 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Bada, Michael
Vasilevsky, Nicole
Baumgartner, William A
Haendel, Melissa
Hunter, Lawrence E
Gold-standard ontology-based anatomical annotation in the CRAFT Corpus
title Gold-standard ontology-based anatomical annotation in the CRAFT Corpus
title_full Gold-standard ontology-based anatomical annotation in the CRAFT Corpus
title_fullStr Gold-standard ontology-based anatomical annotation in the CRAFT Corpus
title_full_unstemmed Gold-standard ontology-based anatomical annotation in the CRAFT Corpus
title_short Gold-standard ontology-based anatomical annotation in the CRAFT Corpus
title_sort gold-standard ontology-based anatomical annotation in the craft corpus
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7243923/
https://www.ncbi.nlm.nih.gov/pubmed/31725864
http://dx.doi.org/10.1093/database/bax087
work_keys_str_mv AT badamichael goldstandardontologybasedanatomicalannotationinthecraftcorpus
AT vasilevskynicole goldstandardontologybasedanatomicalannotationinthecraftcorpus
AT baumgartnerwilliama goldstandardontologybasedanatomicalannotationinthecraftcorpus
AT haendelmelissa goldstandardontologybasedanatomicalannotationinthecraftcorpus
AT hunterlawrencee goldstandardontologybasedanatomicalannotationinthecraftcorpus