Cargando…

Automatic classification of sentences to support Evidence Based Medicine

AIM: Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels. METHOD: We constructed a corpus of 1,000 medical abstracts annotated by hand with specified medical categories (e.g. Intervention, Ou...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Su Nam, Martinez, David, Cavedon, Lawrence, Yencken, Lars
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3073185/ https://www.ncbi.nlm.nih.gov/pubmed/21489224 http://dx.doi.org/10.1186/1471-2105-12-S2-S5

_version_	1782201617525768192
author	Kim, Su Nam Martinez, David Cavedon, Lawrence Yencken, Lars
author_facet	Kim, Su Nam Martinez, David Cavedon, Lawrence Yencken, Lars
author_sort	Kim, Su Nam
collection	PubMed
description	AIM: Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels. METHOD: We constructed a corpus of 1,000 medical abstracts annotated by hand with specified medical categories (e.g. Intervention, Outcome). We explored the use of various features based on lexical, semantic, structural, and sequential information in the data, using Conditional Random Fields (CRF) for classification. RESULTS: For the classification tasks over all labels, our systems achieved micro-averaged f-scores of 80.9% and 66.9% over datasets of structured and unstructured abstracts respectively, using sequential features. In labeling only the key sentences, our systems produced f-scores of 89.3% and 74.0% over structured and unstructured abstracts respectively, using the same sequential features. The results over an external dataset were lower (f-scores of 63.1% for all labels, and 83.8% for key sentences). CONCLUSIONS: Of the features we used, the best for classifying any given sentence in an abstract were based on unigrams, section headings, and sequential information from preceding sentences. These features resulted in improved performance over a simple bag-of-words approach, and outperformed feature sets used in previous work.
format	Text
id	pubmed-3073185
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30731852011-04-12 Automatic classification of sentences to support Evidence Based Medicine Kim, Su Nam Martinez, David Cavedon, Lawrence Yencken, Lars BMC Bioinformatics Proceedings AIM: Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels. METHOD: We constructed a corpus of 1,000 medical abstracts annotated by hand with specified medical categories (e.g. Intervention, Outcome). We explored the use of various features based on lexical, semantic, structural, and sequential information in the data, using Conditional Random Fields (CRF) for classification. RESULTS: For the classification tasks over all labels, our systems achieved micro-averaged f-scores of 80.9% and 66.9% over datasets of structured and unstructured abstracts respectively, using sequential features. In labeling only the key sentences, our systems produced f-scores of 89.3% and 74.0% over structured and unstructured abstracts respectively, using the same sequential features. The results over an external dataset were lower (f-scores of 63.1% for all labels, and 83.8% for key sentences). CONCLUSIONS: Of the features we used, the best for classifying any given sentence in an abstract were based on unigrams, section headings, and sequential information from preceding sentences. These features resulted in improved performance over a simple bag-of-words approach, and outperformed feature sets used in previous work. BioMed Central 2011-03-29 /pmc/articles/PMC3073185/ /pubmed/21489224 http://dx.doi.org/10.1186/1471-2105-12-S2-S5 Text en Copyright ©2011 Kim et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Kim, Su Nam Martinez, David Cavedon, Lawrence Yencken, Lars Automatic classification of sentences to support Evidence Based Medicine
title	Automatic classification of sentences to support Evidence Based Medicine
title_full	Automatic classification of sentences to support Evidence Based Medicine
title_fullStr	Automatic classification of sentences to support Evidence Based Medicine
title_full_unstemmed	Automatic classification of sentences to support Evidence Based Medicine
title_short	Automatic classification of sentences to support Evidence Based Medicine
title_sort	automatic classification of sentences to support evidence based medicine
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3073185/ https://www.ncbi.nlm.nih.gov/pubmed/21489224 http://dx.doi.org/10.1186/1471-2105-12-S2-S5
work_keys_str_mv	AT kimsunam automaticclassificationofsentencestosupportevidencebasedmedicine AT martinezdavid automaticclassificationofsentencestosupportevidencebasedmedicine AT cavedonlawrence automaticclassificationofsentencestosupportevidencebasedmedicine AT yenckenlars automaticclassificationofsentencestosupportevidencebasedmedicine

Automatic classification of sentences to support Evidence Based Medicine

Ejemplares similares