Cargando…

Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI)

MOTIVATION: Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical “term space” (the “Lexeome”), forms a key resource to achieve the full integrat...

Descripción completa

Detalles Bibliográficos
Autores principales: Rebholz-Schuhmann, Dietrich, Kim, Jee-Hyub, Yan, Ying, Dixit, Abhishek, Friteyre, Caroline, Hoehndorf, Robert, Backofen, Rolf, Lewin, Ian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3790750/
https://www.ncbi.nlm.nih.gov/pubmed/24124474
http://dx.doi.org/10.1371/journal.pone.0075185
_version_ 1782286639529197568
author Rebholz-Schuhmann, Dietrich
Kim, Jee-Hyub
Yan, Ying
Dixit, Abhishek
Friteyre, Caroline
Hoehndorf, Robert
Backofen, Rolf
Lewin, Ian
author_facet Rebholz-Schuhmann, Dietrich
Kim, Jee-Hyub
Yan, Ying
Dixit, Abhishek
Friteyre, Caroline
Hoehndorf, Robert
Backofen, Rolf
Lewin, Ian
author_sort Rebholz-Schuhmann, Dietrich
collection PubMed
description MOTIVATION: Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical “term space” (the “Lexeome”), forms a key resource to achieve the full integration of the scientific literature with biomedical data resources: any identified named entity can immediately be normalized to the correct database entry. This goal does not only require that we are aware of all existing terms, but would also profit from knowing all their senses and their semantic interpretation (ambiguities, nestedness). RESULT: This study compiles a resource for lexical terms of biomedical interest in a standard format (called “LexEBI”), determines the overall number of terms, their reuse in different resources and the nestedness of terms. LexEBI comprises references for protein and gene entries and their term variants and chemical entities amongst other terms. In addition, disease terms have been identified from Medline and PubmedCentral and added to LexEBI. Our analysis demonstrates that the baseforms of terms from the different semantic types show only little polysemous use. Nonetheless, the term variants of protein and gene names (PGNs) frequently contain species mentions, which should have been avoided according to protein annotation guidelines. Furthermore, the protein and gene entities as well as the chemical entities, both do comprise enzymes leading to hierarchical polysemy, and a large portion of PGNs make reference to a chemical entity. Altogether, according to our analysis based on the Medline distribution, 401,869 unique PGNs in the documents contain a reference to 25,022 chemical entities, 3,125 disease terms or 1,576 species mentions. CONCLUSION: LexEBI delivers the complete biomedical and chemical Lexeome in a standardized representation (http://www.ebi.ac.uk/Rebholz-srv/LexEBI/). The resource provides the disease terms as open source content, and fully interlinks terms across resources.
format Online
Article
Text
id pubmed-3790750
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-37907502013-10-11 Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI) Rebholz-Schuhmann, Dietrich Kim, Jee-Hyub Yan, Ying Dixit, Abhishek Friteyre, Caroline Hoehndorf, Robert Backofen, Rolf Lewin, Ian PLoS One Research Article MOTIVATION: Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical “term space” (the “Lexeome”), forms a key resource to achieve the full integration of the scientific literature with biomedical data resources: any identified named entity can immediately be normalized to the correct database entry. This goal does not only require that we are aware of all existing terms, but would also profit from knowing all their senses and their semantic interpretation (ambiguities, nestedness). RESULT: This study compiles a resource for lexical terms of biomedical interest in a standard format (called “LexEBI”), determines the overall number of terms, their reuse in different resources and the nestedness of terms. LexEBI comprises references for protein and gene entries and their term variants and chemical entities amongst other terms. In addition, disease terms have been identified from Medline and PubmedCentral and added to LexEBI. Our analysis demonstrates that the baseforms of terms from the different semantic types show only little polysemous use. Nonetheless, the term variants of protein and gene names (PGNs) frequently contain species mentions, which should have been avoided according to protein annotation guidelines. Furthermore, the protein and gene entities as well as the chemical entities, both do comprise enzymes leading to hierarchical polysemy, and a large portion of PGNs make reference to a chemical entity. Altogether, according to our analysis based on the Medline distribution, 401,869 unique PGNs in the documents contain a reference to 25,022 chemical entities, 3,125 disease terms or 1,576 species mentions. CONCLUSION: LexEBI delivers the complete biomedical and chemical Lexeome in a standardized representation (http://www.ebi.ac.uk/Rebholz-srv/LexEBI/). The resource provides the disease terms as open source content, and fully interlinks terms across resources. Public Library of Science 2013-10-04 /pmc/articles/PMC3790750/ /pubmed/24124474 http://dx.doi.org/10.1371/journal.pone.0075185 Text en © 2013 Rebholz-Schuhmann et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Rebholz-Schuhmann, Dietrich
Kim, Jee-Hyub
Yan, Ying
Dixit, Abhishek
Friteyre, Caroline
Hoehndorf, Robert
Backofen, Rolf
Lewin, Ian
Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI)
title Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI)
title_full Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI)
title_fullStr Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI)
title_full_unstemmed Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI)
title_short Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI)
title_sort evaluation and cross-comparison of lexical entities of biological interest (lexebi)
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3790750/
https://www.ncbi.nlm.nih.gov/pubmed/24124474
http://dx.doi.org/10.1371/journal.pone.0075185
work_keys_str_mv AT rebholzschuhmanndietrich evaluationandcrosscomparisonoflexicalentitiesofbiologicalinterestlexebi
AT kimjeehyub evaluationandcrosscomparisonoflexicalentitiesofbiologicalinterestlexebi
AT yanying evaluationandcrosscomparisonoflexicalentitiesofbiologicalinterestlexebi
AT dixitabhishek evaluationandcrosscomparisonoflexicalentitiesofbiologicalinterestlexebi
AT friteyrecaroline evaluationandcrosscomparisonoflexicalentitiesofbiologicalinterestlexebi
AT hoehndorfrobert evaluationandcrosscomparisonoflexicalentitiesofbiologicalinterestlexebi
AT backofenrolf evaluationandcrosscomparisonoflexicalentitiesofbiologicalinterestlexebi
AT lewinian evaluationandcrosscomparisonoflexicalentitiesofbiologicalinterestlexebi