Cargando…

NOBLE – Flexible concept recognition for large-scale biomedical natural language processing

BACKGROUND: Natural language processing (NLP) applications are increasingly important in biomedical data analysis, knowledge engineering, and decision support. Concept recognition is an important component task for NLP pipelines, and can be either general-purpose or domain-specific. We describe a no...

Descripción completa

Detalles Bibliográficos
Autores principales: Tseytlin, Eugene, Mitchell, Kevin, Legowski, Elizabeth, Corrigan, Julia, Chavan, Girish, Jacobson, Rebecca S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4712516/
https://www.ncbi.nlm.nih.gov/pubmed/26763894
http://dx.doi.org/10.1186/s12859-015-0871-y
_version_ 1782410079248580608
author Tseytlin, Eugene
Mitchell, Kevin
Legowski, Elizabeth
Corrigan, Julia
Chavan, Girish
Jacobson, Rebecca S.
author_facet Tseytlin, Eugene
Mitchell, Kevin
Legowski, Elizabeth
Corrigan, Julia
Chavan, Girish
Jacobson, Rebecca S.
author_sort Tseytlin, Eugene
collection PubMed
description BACKGROUND: Natural language processing (NLP) applications are increasingly important in biomedical data analysis, knowledge engineering, and decision support. Concept recognition is an important component task for NLP pipelines, and can be either general-purpose or domain-specific. We describe a novel, flexible, and general-purpose concept recognition component for NLP pipelines, and compare its speed and accuracy against five commonly used alternatives on both a biological and clinical corpus. NOBLE Coder implements a general algorithm for matching terms to concepts from an arbitrary vocabulary set. The system’s matching options can be configured individually or in combination to yield specific system behavior for a variety of NLP tasks. The software is open source, freely available, and easily integrated into UIMA or GATE. We benchmarked speed and accuracy of the system against the CRAFT and ShARe corpora as reference standards and compared it to MMTx, MGrep, Concept Mapper, cTAKES Dictionary Lookup Annotator, and cTAKES Fast Dictionary Lookup Annotator. RESULTS: We describe key advantages of the NOBLE Coder system and associated tools, including its greedy algorithm, configurable matching strategies, and multiple terminology input formats. These features provide unique functionality when compared with existing alternatives, including state-of-the-art systems. On two benchmarking tasks, NOBLE’s performance exceeded commonly used alternatives, performing almost as well as the most advanced systems. Error analysis revealed differences in error profiles among systems. CONCLUSION: NOBLE Coder is comparable to other widely used concept recognition systems in terms of accuracy and speed. Advantages of NOBLE Coder include its interactive terminology builder tool, ease of configuration, and adaptability to various domains and tasks. NOBLE provides a term-to-concept matching system suitable for general concept recognition in biomedical NLP pipelines.
format Online
Article
Text
id pubmed-4712516
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47125162016-01-15 NOBLE – Flexible concept recognition for large-scale biomedical natural language processing Tseytlin, Eugene Mitchell, Kevin Legowski, Elizabeth Corrigan, Julia Chavan, Girish Jacobson, Rebecca S. BMC Bioinformatics Software BACKGROUND: Natural language processing (NLP) applications are increasingly important in biomedical data analysis, knowledge engineering, and decision support. Concept recognition is an important component task for NLP pipelines, and can be either general-purpose or domain-specific. We describe a novel, flexible, and general-purpose concept recognition component for NLP pipelines, and compare its speed and accuracy against five commonly used alternatives on both a biological and clinical corpus. NOBLE Coder implements a general algorithm for matching terms to concepts from an arbitrary vocabulary set. The system’s matching options can be configured individually or in combination to yield specific system behavior for a variety of NLP tasks. The software is open source, freely available, and easily integrated into UIMA or GATE. We benchmarked speed and accuracy of the system against the CRAFT and ShARe corpora as reference standards and compared it to MMTx, MGrep, Concept Mapper, cTAKES Dictionary Lookup Annotator, and cTAKES Fast Dictionary Lookup Annotator. RESULTS: We describe key advantages of the NOBLE Coder system and associated tools, including its greedy algorithm, configurable matching strategies, and multiple terminology input formats. These features provide unique functionality when compared with existing alternatives, including state-of-the-art systems. On two benchmarking tasks, NOBLE’s performance exceeded commonly used alternatives, performing almost as well as the most advanced systems. Error analysis revealed differences in error profiles among systems. CONCLUSION: NOBLE Coder is comparable to other widely used concept recognition systems in terms of accuracy and speed. Advantages of NOBLE Coder include its interactive terminology builder tool, ease of configuration, and adaptability to various domains and tasks. NOBLE provides a term-to-concept matching system suitable for general concept recognition in biomedical NLP pipelines. BioMed Central 2016-01-14 /pmc/articles/PMC4712516/ /pubmed/26763894 http://dx.doi.org/10.1186/s12859-015-0871-y Text en © Tseytlin et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Tseytlin, Eugene
Mitchell, Kevin
Legowski, Elizabeth
Corrigan, Julia
Chavan, Girish
Jacobson, Rebecca S.
NOBLE – Flexible concept recognition for large-scale biomedical natural language processing
title NOBLE – Flexible concept recognition for large-scale biomedical natural language processing
title_full NOBLE – Flexible concept recognition for large-scale biomedical natural language processing
title_fullStr NOBLE – Flexible concept recognition for large-scale biomedical natural language processing
title_full_unstemmed NOBLE – Flexible concept recognition for large-scale biomedical natural language processing
title_short NOBLE – Flexible concept recognition for large-scale biomedical natural language processing
title_sort noble – flexible concept recognition for large-scale biomedical natural language processing
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4712516/
https://www.ncbi.nlm.nih.gov/pubmed/26763894
http://dx.doi.org/10.1186/s12859-015-0871-y
work_keys_str_mv AT tseytlineugene nobleflexibleconceptrecognitionforlargescalebiomedicalnaturallanguageprocessing
AT mitchellkevin nobleflexibleconceptrecognitionforlargescalebiomedicalnaturallanguageprocessing
AT legowskielizabeth nobleflexibleconceptrecognitionforlargescalebiomedicalnaturallanguageprocessing
AT corriganjulia nobleflexibleconceptrecognitionforlargescalebiomedicalnaturallanguageprocessing
AT chavangirish nobleflexibleconceptrecognitionforlargescalebiomedicalnaturallanguageprocessing
AT jacobsonrebeccas nobleflexibleconceptrecognitionforlargescalebiomedicalnaturallanguageprocessing