Cargando…

An active learning-enabled annotation system for clinical named entity recognition

BACKGROUND: Active learning (AL) has shown the promising potential to minimize the annotation cost while maximizing the performance in building statistical natural language processing (NLP) models. However, very few studies have investigated AL in a real-life setting in medical domain. METHODS: In t...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Yukun, Lask, Thomas A., Mei, Qiaozhu, Chen, Qingxia, Moon, Sungrim, Wang, Jingqi, Nguyen, Ky, Dawodu, Tolulola, Cohen, Trevor, Denny, Joshua C., Xu, Hua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5506567/
https://www.ncbi.nlm.nih.gov/pubmed/28699546
http://dx.doi.org/10.1186/s12911-017-0466-9
_version_ 1783249584419504128
author Chen, Yukun
Lask, Thomas A.
Mei, Qiaozhu
Chen, Qingxia
Moon, Sungrim
Wang, Jingqi
Nguyen, Ky
Dawodu, Tolulola
Cohen, Trevor
Denny, Joshua C.
Xu, Hua
author_facet Chen, Yukun
Lask, Thomas A.
Mei, Qiaozhu
Chen, Qingxia
Moon, Sungrim
Wang, Jingqi
Nguyen, Ky
Dawodu, Tolulola
Cohen, Trevor
Denny, Joshua C.
Xu, Hua
author_sort Chen, Yukun
collection PubMed
description BACKGROUND: Active learning (AL) has shown the promising potential to minimize the annotation cost while maximizing the performance in building statistical natural language processing (NLP) models. However, very few studies have investigated AL in a real-life setting in medical domain. METHODS: In this study, we developed the first AL-enabled annotation system for clinical named entity recognition (NER) with a novel AL algorithm. Besides the simulation study to evaluate the novel AL algorithm, we further conducted user studies with two nurses using this system to assess the performance of AL in real world annotation processes for building clinical NER models. RESULTS: The simulation results show that the novel AL algorithm outperformed traditional AL algorithm and random sampling. However, the user study tells a different story that AL methods did not always perform better than random sampling for different users. CONCLUSIONS: We found that the increased information content of actively selected sentences is strongly offset by the increased time required to annotate them. Moreover, the annotation time was not considered in the querying algorithms. Our future work includes developing better AL algorithms with the estimation of annotation time and evaluating the system with larger number of users.
format Online
Article
Text
id pubmed-5506567
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55065672017-07-12 An active learning-enabled annotation system for clinical named entity recognition Chen, Yukun Lask, Thomas A. Mei, Qiaozhu Chen, Qingxia Moon, Sungrim Wang, Jingqi Nguyen, Ky Dawodu, Tolulola Cohen, Trevor Denny, Joshua C. Xu, Hua BMC Med Inform Decis Mak Research BACKGROUND: Active learning (AL) has shown the promising potential to minimize the annotation cost while maximizing the performance in building statistical natural language processing (NLP) models. However, very few studies have investigated AL in a real-life setting in medical domain. METHODS: In this study, we developed the first AL-enabled annotation system for clinical named entity recognition (NER) with a novel AL algorithm. Besides the simulation study to evaluate the novel AL algorithm, we further conducted user studies with two nurses using this system to assess the performance of AL in real world annotation processes for building clinical NER models. RESULTS: The simulation results show that the novel AL algorithm outperformed traditional AL algorithm and random sampling. However, the user study tells a different story that AL methods did not always perform better than random sampling for different users. CONCLUSIONS: We found that the increased information content of actively selected sentences is strongly offset by the increased time required to annotate them. Moreover, the annotation time was not considered in the querying algorithms. Our future work includes developing better AL algorithms with the estimation of annotation time and evaluating the system with larger number of users. BioMed Central 2017-07-05 /pmc/articles/PMC5506567/ /pubmed/28699546 http://dx.doi.org/10.1186/s12911-017-0466-9 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Chen, Yukun
Lask, Thomas A.
Mei, Qiaozhu
Chen, Qingxia
Moon, Sungrim
Wang, Jingqi
Nguyen, Ky
Dawodu, Tolulola
Cohen, Trevor
Denny, Joshua C.
Xu, Hua
An active learning-enabled annotation system for clinical named entity recognition
title An active learning-enabled annotation system for clinical named entity recognition
title_full An active learning-enabled annotation system for clinical named entity recognition
title_fullStr An active learning-enabled annotation system for clinical named entity recognition
title_full_unstemmed An active learning-enabled annotation system for clinical named entity recognition
title_short An active learning-enabled annotation system for clinical named entity recognition
title_sort active learning-enabled annotation system for clinical named entity recognition
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5506567/
https://www.ncbi.nlm.nih.gov/pubmed/28699546
http://dx.doi.org/10.1186/s12911-017-0466-9
work_keys_str_mv AT chenyukun anactivelearningenabledannotationsystemforclinicalnamedentityrecognition
AT laskthomasa anactivelearningenabledannotationsystemforclinicalnamedentityrecognition
AT meiqiaozhu anactivelearningenabledannotationsystemforclinicalnamedentityrecognition
AT chenqingxia anactivelearningenabledannotationsystemforclinicalnamedentityrecognition
AT moonsungrim anactivelearningenabledannotationsystemforclinicalnamedentityrecognition
AT wangjingqi anactivelearningenabledannotationsystemforclinicalnamedentityrecognition
AT nguyenky anactivelearningenabledannotationsystemforclinicalnamedentityrecognition
AT dawodutolulola anactivelearningenabledannotationsystemforclinicalnamedentityrecognition
AT cohentrevor anactivelearningenabledannotationsystemforclinicalnamedentityrecognition
AT dennyjoshuac anactivelearningenabledannotationsystemforclinicalnamedentityrecognition
AT xuhua anactivelearningenabledannotationsystemforclinicalnamedentityrecognition
AT chenyukun activelearningenabledannotationsystemforclinicalnamedentityrecognition
AT laskthomasa activelearningenabledannotationsystemforclinicalnamedentityrecognition
AT meiqiaozhu activelearningenabledannotationsystemforclinicalnamedentityrecognition
AT chenqingxia activelearningenabledannotationsystemforclinicalnamedentityrecognition
AT moonsungrim activelearningenabledannotationsystemforclinicalnamedentityrecognition
AT wangjingqi activelearningenabledannotationsystemforclinicalnamedentityrecognition
AT nguyenky activelearningenabledannotationsystemforclinicalnamedentityrecognition
AT dawodutolulola activelearningenabledannotationsystemforclinicalnamedentityrecognition
AT cohentrevor activelearningenabledannotationsystemforclinicalnamedentityrecognition
AT dennyjoshuac activelearningenabledannotationsystemforclinicalnamedentityrecognition
AT xuhua activelearningenabledannotationsystemforclinicalnamedentityrecognition