Cargando…

Feasibility of pooling annotated corpora for clinical concept extraction

Availability of annotated corpora has facilitated application of machine learning algorithms to concept extraction from clinical notes. However, it is expensive to prepare annotated corpora in individual institutions, and pooling of annotated corpora from other institutions is a potential solution....

Descripción completa

Detalles Bibliográficos
Autores principales: Wagholikar, Kavishwar, Torii, Manabu, Jonnalagadda, Siddhartha, Liu, Hongfang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Informatics Association 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3392069/
https://www.ncbi.nlm.nih.gov/pubmed/22779047
_version_ 1782237590672375808
author Wagholikar, Kavishwar
Torii, Manabu
Jonnalagadda, Siddhartha
Liu, Hongfang
author_facet Wagholikar, Kavishwar
Torii, Manabu
Jonnalagadda, Siddhartha
Liu, Hongfang
author_sort Wagholikar, Kavishwar
collection PubMed
description Availability of annotated corpora has facilitated application of machine learning algorithms to concept extraction from clinical notes. However, it is expensive to prepare annotated corpora in individual institutions, and pooling of annotated corpora from other institutions is a potential solution. In this paper we investigate whether pooling of corpora from two different sources, can improve performance and portability of resultant machine learning taggers for medical problem detection. Specifically, we pool corpora from 2010 i2b2/VA NLP challenge and Mayo Clinic Rochester, to evaluate taggers for recognition of medical problems. Contrary to our expectations, pooling of corpora is found to decrease the F1-score. We examine the annotation guidelines to identify factors for incompatibility of the corpora and suggest development of a standard annotation guideline by the clinical NLP community to allow compatibility of annotated corpora.
format Online
Article
Text
id pubmed-3392069
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher American Medical Informatics Association
record_format MEDLINE/PubMed
spelling pubmed-33920692012-07-09 Feasibility of pooling annotated corpora for clinical concept extraction Wagholikar, Kavishwar Torii, Manabu Jonnalagadda, Siddhartha Liu, Hongfang AMIA Jt Summits Transl Sci Proc Articles Availability of annotated corpora has facilitated application of machine learning algorithms to concept extraction from clinical notes. However, it is expensive to prepare annotated corpora in individual institutions, and pooling of annotated corpora from other institutions is a potential solution. In this paper we investigate whether pooling of corpora from two different sources, can improve performance and portability of resultant machine learning taggers for medical problem detection. Specifically, we pool corpora from 2010 i2b2/VA NLP challenge and Mayo Clinic Rochester, to evaluate taggers for recognition of medical problems. Contrary to our expectations, pooling of corpora is found to decrease the F1-score. We examine the annotation guidelines to identify factors for incompatibility of the corpora and suggest development of a standard annotation guideline by the clinical NLP community to allow compatibility of annotated corpora. American Medical Informatics Association 2012-03-19 /pmc/articles/PMC3392069/ /pubmed/22779047 Text en ©2012 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose
spellingShingle Articles
Wagholikar, Kavishwar
Torii, Manabu
Jonnalagadda, Siddhartha
Liu, Hongfang
Feasibility of pooling annotated corpora for clinical concept extraction
title Feasibility of pooling annotated corpora for clinical concept extraction
title_full Feasibility of pooling annotated corpora for clinical concept extraction
title_fullStr Feasibility of pooling annotated corpora for clinical concept extraction
title_full_unstemmed Feasibility of pooling annotated corpora for clinical concept extraction
title_short Feasibility of pooling annotated corpora for clinical concept extraction
title_sort feasibility of pooling annotated corpora for clinical concept extraction
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3392069/
https://www.ncbi.nlm.nih.gov/pubmed/22779047
work_keys_str_mv AT wagholikarkavishwar feasibilityofpoolingannotatedcorporaforclinicalconceptextraction
AT toriimanabu feasibilityofpoolingannotatedcorporaforclinicalconceptextraction
AT jonnalagaddasiddhartha feasibilityofpoolingannotatedcorporaforclinicalconceptextraction
AT liuhongfang feasibilityofpoolingannotatedcorporaforclinicalconceptextraction