Cargando…
Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life
Numerous digitization and ontological initiatives have focused on translating biological knowledge from narrative text to machine-readable formats. In this paper, we describe two workflows for knowledge extraction and semantic annotation of text data objects featured in an online biodiversity aggreg...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3940440/ https://www.ncbi.nlm.nih.gov/pubmed/24594988 http://dx.doi.org/10.1371/journal.pone.0089550 |
_version_ | 1782305787151908864 |
---|---|
author | Thessen, Anne E. Parr, Cynthia Sims |
author_facet | Thessen, Anne E. Parr, Cynthia Sims |
author_sort | Thessen, Anne E. |
collection | PubMed |
description | Numerous digitization and ontological initiatives have focused on translating biological knowledge from narrative text to machine-readable formats. In this paper, we describe two workflows for knowledge extraction and semantic annotation of text data objects featured in an online biodiversity aggregator, the Encyclopedia of Life. One workflow tags text with DBpedia URIs based on keywords. Another workflow finds taxon names in text using GNRD for the purpose of building a species association network. Both workflows work well: the annotation workflow has an F1 Score of 0.941 and the association algorithm has an F1 Score of 0.885. Existing text annotators such as Terminizer and DBpedia Spotlight performed well, but require some optimization to be useful in the ecology and evolution domain. Important future work includes scaling up and improving accuracy through the use of distributional semantics. |
format | Online Article Text |
id | pubmed-3940440 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-39404402014-03-06 Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life Thessen, Anne E. Parr, Cynthia Sims PLoS One Research Article Numerous digitization and ontological initiatives have focused on translating biological knowledge from narrative text to machine-readable formats. In this paper, we describe two workflows for knowledge extraction and semantic annotation of text data objects featured in an online biodiversity aggregator, the Encyclopedia of Life. One workflow tags text with DBpedia URIs based on keywords. Another workflow finds taxon names in text using GNRD for the purpose of building a species association network. Both workflows work well: the annotation workflow has an F1 Score of 0.941 and the association algorithm has an F1 Score of 0.885. Existing text annotators such as Terminizer and DBpedia Spotlight performed well, but require some optimization to be useful in the ecology and evolution domain. Important future work includes scaling up and improving accuracy through the use of distributional semantics. Public Library of Science 2014-03-03 /pmc/articles/PMC3940440/ /pubmed/24594988 http://dx.doi.org/10.1371/journal.pone.0089550 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. |
spellingShingle | Research Article Thessen, Anne E. Parr, Cynthia Sims Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life |
title | Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life |
title_full | Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life |
title_fullStr | Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life |
title_full_unstemmed | Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life |
title_short | Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life |
title_sort | knowledge extraction and semantic annotation of text from the encyclopedia of life |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3940440/ https://www.ncbi.nlm.nih.gov/pubmed/24594988 http://dx.doi.org/10.1371/journal.pone.0089550 |
work_keys_str_mv | AT thessenannee knowledgeextractionandsemanticannotationoftextfromtheencyclopediaoflife AT parrcynthiasims knowledgeextractionandsemanticannotationoftextfromtheencyclopediaoflife |