Cargando…

Detection of sentence boundaries and abbreviations in clinical narratives

BACKGROUND: In Western languages the period character is highly ambiguous, due to its double role as sentence delimiter and abbreviation marker. This is particularly relevant in clinical free-texts characterized by numerous anomalies in spelling, punctuation, vocabulary and with a high frequency of...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kreuzthaler, Markus, Schulz, Stefan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4474545/ https://www.ncbi.nlm.nih.gov/pubmed/26099994 http://dx.doi.org/10.1186/1472-6947-15-S2-S4

_version_	1782377287408156672
author	Kreuzthaler, Markus Schulz, Stefan
author_facet	Kreuzthaler, Markus Schulz, Stefan
author_sort	Kreuzthaler, Markus
collection	PubMed
description	BACKGROUND: In Western languages the period character is highly ambiguous, due to its double role as sentence delimiter and abbreviation marker. This is particularly relevant in clinical free-texts characterized by numerous anomalies in spelling, punctuation, vocabulary and with a high frequency of short forms. METHODS: The problem is addressed by two binary classifiers for abbreviation and sentence detection. A support vector machine exploiting a linear kernel is trained on different combinations of feature sets for each classification task. Feature relevance ranking is applied to investigate which features are important for the particular task. The methods are applied to German language texts from a medical record system, authored by specialized physicians. RESULTS: Two collections of 3,024 text snippets were annotated regarding the role of period characters for training and testing. Cohen's kappa resulted in 0.98. For abbreviation and sentence boundary detection we can report an unweighted micro-averaged F-measure using a 10-fold cross validation of 0.97 for the training set. For test set based evaluation we obtained an unweighted micro-averaged F-measure of 0.95 for abbreviation detection and 0.94 for sentence delineation. Language-dependent resources and rules were found to have less impact on abbreviation detection than on sentence delineation. CONCLUSIONS: Sentence detection is an important task, which should be performed at the beginning of a text processing pipeline. For the text genre under scrutiny we showed that support vector machines exploiting a linear kernel produce state of the art results for sentence boundary detection. The results are comparable with other sentence boundary detection methods applied to English clinical texts. We identified abbreviation detection as a supportive task for sentence delineation.
format	Online Article Text
id	pubmed-4474545
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-44745452015-06-25 Detection of sentence boundaries and abbreviations in clinical narratives Kreuzthaler, Markus Schulz, Stefan BMC Med Inform Decis Mak Proceedings BACKGROUND: In Western languages the period character is highly ambiguous, due to its double role as sentence delimiter and abbreviation marker. This is particularly relevant in clinical free-texts characterized by numerous anomalies in spelling, punctuation, vocabulary and with a high frequency of short forms. METHODS: The problem is addressed by two binary classifiers for abbreviation and sentence detection. A support vector machine exploiting a linear kernel is trained on different combinations of feature sets for each classification task. Feature relevance ranking is applied to investigate which features are important for the particular task. The methods are applied to German language texts from a medical record system, authored by specialized physicians. RESULTS: Two collections of 3,024 text snippets were annotated regarding the role of period characters for training and testing. Cohen's kappa resulted in 0.98. For abbreviation and sentence boundary detection we can report an unweighted micro-averaged F-measure using a 10-fold cross validation of 0.97 for the training set. For test set based evaluation we obtained an unweighted micro-averaged F-measure of 0.95 for abbreviation detection and 0.94 for sentence delineation. Language-dependent resources and rules were found to have less impact on abbreviation detection than on sentence delineation. CONCLUSIONS: Sentence detection is an important task, which should be performed at the beginning of a text processing pipeline. For the text genre under scrutiny we showed that support vector machines exploiting a linear kernel produce state of the art results for sentence boundary detection. The results are comparable with other sentence boundary detection methods applied to English clinical texts. We identified abbreviation detection as a supportive task for sentence delineation. BioMed Central 2015-06-15 /pmc/articles/PMC4474545/ /pubmed/26099994 http://dx.doi.org/10.1186/1472-6947-15-S2-S4 Text en Copyright © 2015 Kreuzthaler and Schulz; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Proceedings Kreuzthaler, Markus Schulz, Stefan Detection of sentence boundaries and abbreviations in clinical narratives
title	Detection of sentence boundaries and abbreviations in clinical narratives
title_full	Detection of sentence boundaries and abbreviations in clinical narratives
title_fullStr	Detection of sentence boundaries and abbreviations in clinical narratives
title_full_unstemmed	Detection of sentence boundaries and abbreviations in clinical narratives
title_short	Detection of sentence boundaries and abbreviations in clinical narratives
title_sort	detection of sentence boundaries and abbreviations in clinical narratives
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4474545/ https://www.ncbi.nlm.nih.gov/pubmed/26099994 http://dx.doi.org/10.1186/1472-6947-15-S2-S4
work_keys_str_mv	AT kreuzthalermarkus detectionofsentenceboundariesandabbreviationsinclinicalnarratives AT schulzstefan detectionofsentenceboundariesandabbreviationsinclinicalnarratives

Detection of sentence boundaries and abbreviations in clinical narratives

Ejemplares similares