Cargando…

Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements

OBJECTIVE: To present a series of experiments: (1) to evaluate the impact of pre-annotation on the speed of manual annotation of clinical trial announcements; and (2) to test for potential bias, if pre-annotation is utilized. METHODS: To build the gold standard, 1400 clinical trial announcements fro...

Descripción completa

Detalles Bibliográficos
Autores principales: Lingren, Todd, Deleger, Louise, Molnar, Katalin, Zhai, Haijun, Meinzen-Derr, Jareen, Kaiser, Megan, Stoutenborough, Laura, Li, Qi, Solti, Imre
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3994857/
https://www.ncbi.nlm.nih.gov/pubmed/24001514
http://dx.doi.org/10.1136/amiajnl-2013-001837
_version_ 1782312792321163264
author Lingren, Todd
Deleger, Louise
Molnar, Katalin
Zhai, Haijun
Meinzen-Derr, Jareen
Kaiser, Megan
Stoutenborough, Laura
Li, Qi
Solti, Imre
author_facet Lingren, Todd
Deleger, Louise
Molnar, Katalin
Zhai, Haijun
Meinzen-Derr, Jareen
Kaiser, Megan
Stoutenborough, Laura
Li, Qi
Solti, Imre
author_sort Lingren, Todd
collection PubMed
description OBJECTIVE: To present a series of experiments: (1) to evaluate the impact of pre-annotation on the speed of manual annotation of clinical trial announcements; and (2) to test for potential bias, if pre-annotation is utilized. METHODS: To build the gold standard, 1400 clinical trial announcements from the clinicaltrials.gov website were randomly selected and double annotated for diagnoses, signs, symptoms, Unified Medical Language System (UMLS) Concept Unique Identifiers, and SNOMED CT codes. We used two dictionary-based methods to pre-annotate the text. We evaluated the annotation time and potential bias through F-measures and ANOVA tests and implemented Bonferroni correction. RESULTS: Time savings ranged from 13.85% to 21.5% per entity. Inter-annotator agreement (IAA) ranged from 93.4% to 95.5%. There was no statistically significant difference for IAA and annotator performance in pre-annotations. CONCLUSIONS: On every experiment pair, the annotator with the pre-annotated text needed less time to annotate than the annotator with non-labeled text. The time savings were statistically significant. Moreover, the pre-annotation did not reduce the IAA or annotator performance. Dictionary-based pre-annotation is a feasible and practical method to reduce the cost of annotation of clinical named entity recognition in the eligibility sections of clinical trial announcements without introducing bias in the annotation process.
format Online
Article
Text
id pubmed-3994857
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-39948572014-04-22 Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements Lingren, Todd Deleger, Louise Molnar, Katalin Zhai, Haijun Meinzen-Derr, Jareen Kaiser, Megan Stoutenborough, Laura Li, Qi Solti, Imre J Am Med Inform Assoc Research and Applications OBJECTIVE: To present a series of experiments: (1) to evaluate the impact of pre-annotation on the speed of manual annotation of clinical trial announcements; and (2) to test for potential bias, if pre-annotation is utilized. METHODS: To build the gold standard, 1400 clinical trial announcements from the clinicaltrials.gov website were randomly selected and double annotated for diagnoses, signs, symptoms, Unified Medical Language System (UMLS) Concept Unique Identifiers, and SNOMED CT codes. We used two dictionary-based methods to pre-annotate the text. We evaluated the annotation time and potential bias through F-measures and ANOVA tests and implemented Bonferroni correction. RESULTS: Time savings ranged from 13.85% to 21.5% per entity. Inter-annotator agreement (IAA) ranged from 93.4% to 95.5%. There was no statistically significant difference for IAA and annotator performance in pre-annotations. CONCLUSIONS: On every experiment pair, the annotator with the pre-annotated text needed less time to annotate than the annotator with non-labeled text. The time savings were statistically significant. Moreover, the pre-annotation did not reduce the IAA or annotator performance. Dictionary-based pre-annotation is a feasible and practical method to reduce the cost of annotation of clinical named entity recognition in the eligibility sections of clinical trial announcements without introducing bias in the annotation process. BMJ Publishing Group 2014-05 2013-09-03 /pmc/articles/PMC3994857/ /pubmed/24001514 http://dx.doi.org/10.1136/amiajnl-2013-001837 Text en Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/
spellingShingle Research and Applications
Lingren, Todd
Deleger, Louise
Molnar, Katalin
Zhai, Haijun
Meinzen-Derr, Jareen
Kaiser, Megan
Stoutenborough, Laura
Li, Qi
Solti, Imre
Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements
title Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements
title_full Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements
title_fullStr Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements
title_full_unstemmed Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements
title_short Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements
title_sort evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3994857/
https://www.ncbi.nlm.nih.gov/pubmed/24001514
http://dx.doi.org/10.1136/amiajnl-2013-001837
work_keys_str_mv AT lingrentodd evaluatingtheimpactofpreannotationonannotationspeedandpotentialbiasnaturallanguageprocessinggoldstandarddevelopmentforclinicalnamedentityrecognitioninclinicaltrialannouncements
AT delegerlouise evaluatingtheimpactofpreannotationonannotationspeedandpotentialbiasnaturallanguageprocessinggoldstandarddevelopmentforclinicalnamedentityrecognitioninclinicaltrialannouncements
AT molnarkatalin evaluatingtheimpactofpreannotationonannotationspeedandpotentialbiasnaturallanguageprocessinggoldstandarddevelopmentforclinicalnamedentityrecognitioninclinicaltrialannouncements
AT zhaihaijun evaluatingtheimpactofpreannotationonannotationspeedandpotentialbiasnaturallanguageprocessinggoldstandarddevelopmentforclinicalnamedentityrecognitioninclinicaltrialannouncements
AT meinzenderrjareen evaluatingtheimpactofpreannotationonannotationspeedandpotentialbiasnaturallanguageprocessinggoldstandarddevelopmentforclinicalnamedentityrecognitioninclinicaltrialannouncements
AT kaisermegan evaluatingtheimpactofpreannotationonannotationspeedandpotentialbiasnaturallanguageprocessinggoldstandarddevelopmentforclinicalnamedentityrecognitioninclinicaltrialannouncements
AT stoutenboroughlaura evaluatingtheimpactofpreannotationonannotationspeedandpotentialbiasnaturallanguageprocessinggoldstandarddevelopmentforclinicalnamedentityrecognitioninclinicaltrialannouncements
AT liqi evaluatingtheimpactofpreannotationonannotationspeedandpotentialbiasnaturallanguageprocessinggoldstandarddevelopmentforclinicalnamedentityrecognitioninclinicaltrialannouncements
AT soltiimre evaluatingtheimpactofpreannotationonannotationspeedandpotentialbiasnaturallanguageprocessinggoldstandarddevelopmentforclinicalnamedentityrecognitioninclinicaltrialannouncements