Cargando…

A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain

Sentence boundary detection (SBD) is a critical preprocessing task for many natural language processing (NLP) applications. However, there has been little work on evaluating how well existing methods for SBD perform in the clinical domain. We evaluate five popular off-the-shelf NLP toolkits on the t...

Descripción completa

Detalles Bibliográficos
Autores principales: Griffis, Denis, Shivade, Chaitanya, Fosler-Lussier, Eric, Lai, Albert M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Informatics Association 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001746/
https://www.ncbi.nlm.nih.gov/pubmed/27570656
_version_ 1782450475536220160
author Griffis, Denis
Shivade, Chaitanya
Fosler-Lussier, Eric
Lai, Albert M.
author_facet Griffis, Denis
Shivade, Chaitanya
Fosler-Lussier, Eric
Lai, Albert M.
author_sort Griffis, Denis
collection PubMed
description Sentence boundary detection (SBD) is a critical preprocessing task for many natural language processing (NLP) applications. However, there has been little work on evaluating how well existing methods for SBD perform in the clinical domain. We evaluate five popular off-the-shelf NLP toolkits on the task of SBD in various kinds of text using a diverse set of corpora, including the GENIA corpus of biomedical abstracts, a corpus of clinical notes used in the 2010 i2b2 shared task, and two general-domain corpora (the British National Corpus and Switchboard). We find that, with the exception of the cTAKES system, the toolkits we evaluate perform noticeably worse on clinical text than on general-domain text. We identify and discuss major classes of errors, and suggest directions for future work to improve SBD methods in the clinical domain. We also make the code used for SBD evaluation in this paper available for download at http://github.com/drgriffis/SBD-Evaluation.
format Online
Article
Text
id pubmed-5001746
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher American Medical Informatics Association
record_format MEDLINE/PubMed
spelling pubmed-50017462016-08-26 A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain Griffis, Denis Shivade, Chaitanya Fosler-Lussier, Eric Lai, Albert M. AMIA Jt Summits Transl Sci Proc Articles Sentence boundary detection (SBD) is a critical preprocessing task for many natural language processing (NLP) applications. However, there has been little work on evaluating how well existing methods for SBD perform in the clinical domain. We evaluate five popular off-the-shelf NLP toolkits on the task of SBD in various kinds of text using a diverse set of corpora, including the GENIA corpus of biomedical abstracts, a corpus of clinical notes used in the 2010 i2b2 shared task, and two general-domain corpora (the British National Corpus and Switchboard). We find that, with the exception of the cTAKES system, the toolkits we evaluate perform noticeably worse on clinical text than on general-domain text. We identify and discuss major classes of errors, and suggest directions for future work to improve SBD methods in the clinical domain. We also make the code used for SBD evaluation in this paper available for download at http://github.com/drgriffis/SBD-Evaluation. American Medical Informatics Association 2016-07-20 /pmc/articles/PMC5001746/ /pubmed/27570656 Text en ©2016 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose
spellingShingle Articles
Griffis, Denis
Shivade, Chaitanya
Fosler-Lussier, Eric
Lai, Albert M.
A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain
title A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain
title_full A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain
title_fullStr A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain
title_full_unstemmed A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain
title_short A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain
title_sort quantitative and qualitative evaluation of sentence boundary detection for the clinical domain
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001746/
https://www.ncbi.nlm.nih.gov/pubmed/27570656
work_keys_str_mv AT griffisdenis aquantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain
AT shivadechaitanya aquantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain
AT foslerlussiereric aquantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain
AT laialbertm aquantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain
AT griffisdenis quantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain
AT shivadechaitanya quantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain
AT foslerlussiereric quantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain
AT laialbertm quantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain