Cargando…

SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes

BACKGROUND: Despite a wide adoption of English in science, a significant amount of biomedical data are produced in other languages, such as French. Yet a majority of natural language processing or semantic tools as well as domain terminologies or ontologies are only available in English, and cannot...

Descripción completa

Detalles Bibliográficos
Autores principales: Tchechmedjiev, Andon, Abdaoui, Amine, Emonet, Vincent, Zevio, Stella, Jonquet, Clement
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6218966/
https://www.ncbi.nlm.nih.gov/pubmed/30400805
http://dx.doi.org/10.1186/s12859-018-2429-2
_version_ 1783368553860169728
author Tchechmedjiev, Andon
Abdaoui, Amine
Emonet, Vincent
Zevio, Stella
Jonquet, Clement
author_facet Tchechmedjiev, Andon
Abdaoui, Amine
Emonet, Vincent
Zevio, Stella
Jonquet, Clement
author_sort Tchechmedjiev, Andon
collection PubMed
description BACKGROUND: Despite a wide adoption of English in science, a significant amount of biomedical data are produced in other languages, such as French. Yet a majority of natural language processing or semantic tools as well as domain terminologies or ontologies are only available in English, and cannot be readily applied to other languages, due to fundamental linguistic differences. However, semantic resources are required to design semantic indexes and transform biomedical (text)data into knowledge for better information mining and retrieval. RESULTS: We present the SIFR Annotator (http://bioportal.lirmm.fr/annotator), a publicly accessible ontology-based annotation web service to process biomedical text data in French. The service, developed during the Semantic Indexing of French Biomedical Data Resources (2013–2019) project is included in the SIFR BioPortal, an open platform to host French biomedical ontologies and terminologies based on the technology developed by the US National Center for Biomedical Ontology. The portal facilitates use and fostering of ontologies by offering a set of services –search, mappings, metadata, versioning, visualization, recommendation– including for annotation purposes. We introduce the adaptations and improvements made in applying the technology to French as well as a number of language independent additional features –implemented by means of a proxy architecture– in particular annotation scoring and clinical context detection. We evaluate the performance of the SIFR Annotator on different biomedical data, using available French corpora –Quaero (titles from French MEDLINE abstracts and EMEA drug labels) and CépiDC (ICD-10 coding of death certificates)– and discuss our results with respect to the CLEF eHealth information extraction tasks. CONCLUSIONS: We show the web service performs comparably to other knowledge-based annotation approaches in recognizing entities in biomedical text and reach state-of-the-art levels in clinical context detection (negation, experiencer, temporality). Additionally, the SIFR Annotator is the first openly web accessible tool to annotate and contextualize French biomedical text with ontology concepts leveraging a dictionary currently made of 28 terminologies and ontologies and 333 K concepts. The code is openly available, and we also provide a Docker packaging for easy local deployment to process sensitive (e.g., clinical) data in-house (https://github.com/sifrproject).
format Online
Article
Text
id pubmed-6218966
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62189662018-11-08 SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes Tchechmedjiev, Andon Abdaoui, Amine Emonet, Vincent Zevio, Stella Jonquet, Clement BMC Bioinformatics Software BACKGROUND: Despite a wide adoption of English in science, a significant amount of biomedical data are produced in other languages, such as French. Yet a majority of natural language processing or semantic tools as well as domain terminologies or ontologies are only available in English, and cannot be readily applied to other languages, due to fundamental linguistic differences. However, semantic resources are required to design semantic indexes and transform biomedical (text)data into knowledge for better information mining and retrieval. RESULTS: We present the SIFR Annotator (http://bioportal.lirmm.fr/annotator), a publicly accessible ontology-based annotation web service to process biomedical text data in French. The service, developed during the Semantic Indexing of French Biomedical Data Resources (2013–2019) project is included in the SIFR BioPortal, an open platform to host French biomedical ontologies and terminologies based on the technology developed by the US National Center for Biomedical Ontology. The portal facilitates use and fostering of ontologies by offering a set of services –search, mappings, metadata, versioning, visualization, recommendation– including for annotation purposes. We introduce the adaptations and improvements made in applying the technology to French as well as a number of language independent additional features –implemented by means of a proxy architecture– in particular annotation scoring and clinical context detection. We evaluate the performance of the SIFR Annotator on different biomedical data, using available French corpora –Quaero (titles from French MEDLINE abstracts and EMEA drug labels) and CépiDC (ICD-10 coding of death certificates)– and discuss our results with respect to the CLEF eHealth information extraction tasks. CONCLUSIONS: We show the web service performs comparably to other knowledge-based annotation approaches in recognizing entities in biomedical text and reach state-of-the-art levels in clinical context detection (negation, experiencer, temporality). Additionally, the SIFR Annotator is the first openly web accessible tool to annotate and contextualize French biomedical text with ontology concepts leveraging a dictionary currently made of 28 terminologies and ontologies and 333 K concepts. The code is openly available, and we also provide a Docker packaging for easy local deployment to process sensitive (e.g., clinical) data in-house (https://github.com/sifrproject). BioMed Central 2018-11-06 /pmc/articles/PMC6218966/ /pubmed/30400805 http://dx.doi.org/10.1186/s12859-018-2429-2 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Tchechmedjiev, Andon
Abdaoui, Amine
Emonet, Vincent
Zevio, Stella
Jonquet, Clement
SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes
title SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes
title_full SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes
title_fullStr SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes
title_full_unstemmed SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes
title_short SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes
title_sort sifr annotator: ontology-based semantic annotation of french biomedical text and clinical notes
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6218966/
https://www.ncbi.nlm.nih.gov/pubmed/30400805
http://dx.doi.org/10.1186/s12859-018-2429-2
work_keys_str_mv AT tchechmedjievandon sifrannotatorontologybasedsemanticannotationoffrenchbiomedicaltextandclinicalnotes
AT abdaouiamine sifrannotatorontologybasedsemanticannotationoffrenchbiomedicaltextandclinicalnotes
AT emonetvincent sifrannotatorontologybasedsemanticannotationoffrenchbiomedicaltextandclinicalnotes
AT zeviostella sifrannotatorontologybasedsemanticannotationoffrenchbiomedicaltextandclinicalnotes
AT jonquetclement sifrannotatorontologybasedsemanticannotationoffrenchbiomedicaltextandclinicalnotes