Cargando…
Feasibility of pooling annotated corpora for clinical concept extraction
Availability of annotated corpora has facilitated application of machine learning algorithms to concept extraction from clinical notes. However, it is expensive to prepare annotated corpora in individual institutions, and pooling of annotated corpora from other institutions is a potential solution....
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Medical Informatics Association
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3392069/ https://www.ncbi.nlm.nih.gov/pubmed/22779047 |
_version_ | 1782237590672375808 |
---|---|
author | Wagholikar, Kavishwar Torii, Manabu Jonnalagadda, Siddhartha Liu, Hongfang |
author_facet | Wagholikar, Kavishwar Torii, Manabu Jonnalagadda, Siddhartha Liu, Hongfang |
author_sort | Wagholikar, Kavishwar |
collection | PubMed |
description | Availability of annotated corpora has facilitated application of machine learning algorithms to concept extraction from clinical notes. However, it is expensive to prepare annotated corpora in individual institutions, and pooling of annotated corpora from other institutions is a potential solution. In this paper we investigate whether pooling of corpora from two different sources, can improve performance and portability of resultant machine learning taggers for medical problem detection. Specifically, we pool corpora from 2010 i2b2/VA NLP challenge and Mayo Clinic Rochester, to evaluate taggers for recognition of medical problems. Contrary to our expectations, pooling of corpora is found to decrease the F1-score. We examine the annotation guidelines to identify factors for incompatibility of the corpora and suggest development of a standard annotation guideline by the clinical NLP community to allow compatibility of annotated corpora. |
format | Online Article Text |
id | pubmed-3392069 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | American Medical Informatics Association |
record_format | MEDLINE/PubMed |
spelling | pubmed-33920692012-07-09 Feasibility of pooling annotated corpora for clinical concept extraction Wagholikar, Kavishwar Torii, Manabu Jonnalagadda, Siddhartha Liu, Hongfang AMIA Jt Summits Transl Sci Proc Articles Availability of annotated corpora has facilitated application of machine learning algorithms to concept extraction from clinical notes. However, it is expensive to prepare annotated corpora in individual institutions, and pooling of annotated corpora from other institutions is a potential solution. In this paper we investigate whether pooling of corpora from two different sources, can improve performance and portability of resultant machine learning taggers for medical problem detection. Specifically, we pool corpora from 2010 i2b2/VA NLP challenge and Mayo Clinic Rochester, to evaluate taggers for recognition of medical problems. Contrary to our expectations, pooling of corpora is found to decrease the F1-score. We examine the annotation guidelines to identify factors for incompatibility of the corpora and suggest development of a standard annotation guideline by the clinical NLP community to allow compatibility of annotated corpora. American Medical Informatics Association 2012-03-19 /pmc/articles/PMC3392069/ /pubmed/22779047 Text en ©2012 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose |
spellingShingle | Articles Wagholikar, Kavishwar Torii, Manabu Jonnalagadda, Siddhartha Liu, Hongfang Feasibility of pooling annotated corpora for clinical concept extraction |
title | Feasibility of pooling annotated corpora for clinical concept extraction |
title_full | Feasibility of pooling annotated corpora for clinical concept extraction |
title_fullStr | Feasibility of pooling annotated corpora for clinical concept extraction |
title_full_unstemmed | Feasibility of pooling annotated corpora for clinical concept extraction |
title_short | Feasibility of pooling annotated corpora for clinical concept extraction |
title_sort | feasibility of pooling annotated corpora for clinical concept extraction |
topic | Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3392069/ https://www.ncbi.nlm.nih.gov/pubmed/22779047 |
work_keys_str_mv | AT wagholikarkavishwar feasibilityofpoolingannotatedcorporaforclinicalconceptextraction AT toriimanabu feasibilityofpoolingannotatedcorporaforclinicalconceptextraction AT jonnalagaddasiddhartha feasibilityofpoolingannotatedcorporaforclinicalconceptextraction AT liuhongfang feasibilityofpoolingannotatedcorporaforclinicalconceptextraction |