Cargando…
A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain
Sentence boundary detection (SBD) is a critical preprocessing task for many natural language processing (NLP) applications. However, there has been little work on evaluating how well existing methods for SBD perform in the clinical domain. We evaluate five popular off-the-shelf NLP toolkits on the t...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Medical Informatics Association
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001746/ https://www.ncbi.nlm.nih.gov/pubmed/27570656 |
_version_ | 1782450475536220160 |
---|---|
author | Griffis, Denis Shivade, Chaitanya Fosler-Lussier, Eric Lai, Albert M. |
author_facet | Griffis, Denis Shivade, Chaitanya Fosler-Lussier, Eric Lai, Albert M. |
author_sort | Griffis, Denis |
collection | PubMed |
description | Sentence boundary detection (SBD) is a critical preprocessing task for many natural language processing (NLP) applications. However, there has been little work on evaluating how well existing methods for SBD perform in the clinical domain. We evaluate five popular off-the-shelf NLP toolkits on the task of SBD in various kinds of text using a diverse set of corpora, including the GENIA corpus of biomedical abstracts, a corpus of clinical notes used in the 2010 i2b2 shared task, and two general-domain corpora (the British National Corpus and Switchboard). We find that, with the exception of the cTAKES system, the toolkits we evaluate perform noticeably worse on clinical text than on general-domain text. We identify and discuss major classes of errors, and suggest directions for future work to improve SBD methods in the clinical domain. We also make the code used for SBD evaluation in this paper available for download at http://github.com/drgriffis/SBD-Evaluation. |
format | Online Article Text |
id | pubmed-5001746 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | American Medical Informatics Association |
record_format | MEDLINE/PubMed |
spelling | pubmed-50017462016-08-26 A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain Griffis, Denis Shivade, Chaitanya Fosler-Lussier, Eric Lai, Albert M. AMIA Jt Summits Transl Sci Proc Articles Sentence boundary detection (SBD) is a critical preprocessing task for many natural language processing (NLP) applications. However, there has been little work on evaluating how well existing methods for SBD perform in the clinical domain. We evaluate five popular off-the-shelf NLP toolkits on the task of SBD in various kinds of text using a diverse set of corpora, including the GENIA corpus of biomedical abstracts, a corpus of clinical notes used in the 2010 i2b2 shared task, and two general-domain corpora (the British National Corpus and Switchboard). We find that, with the exception of the cTAKES system, the toolkits we evaluate perform noticeably worse on clinical text than on general-domain text. We identify and discuss major classes of errors, and suggest directions for future work to improve SBD methods in the clinical domain. We also make the code used for SBD evaluation in this paper available for download at http://github.com/drgriffis/SBD-Evaluation. American Medical Informatics Association 2016-07-20 /pmc/articles/PMC5001746/ /pubmed/27570656 Text en ©2016 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose |
spellingShingle | Articles Griffis, Denis Shivade, Chaitanya Fosler-Lussier, Eric Lai, Albert M. A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain |
title | A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain |
title_full | A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain |
title_fullStr | A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain |
title_full_unstemmed | A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain |
title_short | A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain |
title_sort | quantitative and qualitative evaluation of sentence boundary detection for the clinical domain |
topic | Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001746/ https://www.ncbi.nlm.nih.gov/pubmed/27570656 |
work_keys_str_mv | AT griffisdenis aquantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain AT shivadechaitanya aquantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain AT foslerlussiereric aquantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain AT laialbertm aquantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain AT griffisdenis quantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain AT shivadechaitanya quantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain AT foslerlussiereric quantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain AT laialbertm quantitativeandqualitativeevaluationofsentenceboundarydetectionfortheclinicaldomain |