Cargando…

Feasibility of pooling annotated corpora for clinical concept extraction

Availability of annotated corpora has facilitated application of machine learning algorithms to concept extraction from clinical notes. However, it is expensive to prepare annotated corpora in individual institutions, and pooling of annotated corpora from other institutions is a potential solution....

Descripción completa

Detalles Bibliográficos
Autores principales: Wagholikar, Kavishwar, Torii, Manabu, Jonnalagadda, Siddhartha, Liu, Hongfang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Informatics Association 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3392069/
https://www.ncbi.nlm.nih.gov/pubmed/22779047
Descripción
Sumario:Availability of annotated corpora has facilitated application of machine learning algorithms to concept extraction from clinical notes. However, it is expensive to prepare annotated corpora in individual institutions, and pooling of annotated corpora from other institutions is a potential solution. In this paper we investigate whether pooling of corpora from two different sources, can improve performance and portability of resultant machine learning taggers for medical problem detection. Specifically, we pool corpora from 2010 i2b2/VA NLP challenge and Mayo Clinic Rochester, to evaluate taggers for recognition of medical problems. Contrary to our expectations, pooling of corpora is found to decrease the F1-score. We examine the annotation guidelines to identify factors for incompatibility of the corpora and suggest development of a standard annotation guideline by the clinical NLP community to allow compatibility of annotated corpora.