Cargando…
An active learning-enabled annotation system for clinical named entity recognition
BACKGROUND: Active learning (AL) has shown the promising potential to minimize the annotation cost while maximizing the performance in building statistical natural language processing (NLP) models. However, very few studies have investigated AL in a real-life setting in medical domain. METHODS: In t...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5506567/ https://www.ncbi.nlm.nih.gov/pubmed/28699546 http://dx.doi.org/10.1186/s12911-017-0466-9 |
_version_ | 1783249584419504128 |
---|---|
author | Chen, Yukun Lask, Thomas A. Mei, Qiaozhu Chen, Qingxia Moon, Sungrim Wang, Jingqi Nguyen, Ky Dawodu, Tolulola Cohen, Trevor Denny, Joshua C. Xu, Hua |
author_facet | Chen, Yukun Lask, Thomas A. Mei, Qiaozhu Chen, Qingxia Moon, Sungrim Wang, Jingqi Nguyen, Ky Dawodu, Tolulola Cohen, Trevor Denny, Joshua C. Xu, Hua |
author_sort | Chen, Yukun |
collection | PubMed |
description | BACKGROUND: Active learning (AL) has shown the promising potential to minimize the annotation cost while maximizing the performance in building statistical natural language processing (NLP) models. However, very few studies have investigated AL in a real-life setting in medical domain. METHODS: In this study, we developed the first AL-enabled annotation system for clinical named entity recognition (NER) with a novel AL algorithm. Besides the simulation study to evaluate the novel AL algorithm, we further conducted user studies with two nurses using this system to assess the performance of AL in real world annotation processes for building clinical NER models. RESULTS: The simulation results show that the novel AL algorithm outperformed traditional AL algorithm and random sampling. However, the user study tells a different story that AL methods did not always perform better than random sampling for different users. CONCLUSIONS: We found that the increased information content of actively selected sentences is strongly offset by the increased time required to annotate them. Moreover, the annotation time was not considered in the querying algorithms. Our future work includes developing better AL algorithms with the estimation of annotation time and evaluating the system with larger number of users. |
format | Online Article Text |
id | pubmed-5506567 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-55065672017-07-12 An active learning-enabled annotation system for clinical named entity recognition Chen, Yukun Lask, Thomas A. Mei, Qiaozhu Chen, Qingxia Moon, Sungrim Wang, Jingqi Nguyen, Ky Dawodu, Tolulola Cohen, Trevor Denny, Joshua C. Xu, Hua BMC Med Inform Decis Mak Research BACKGROUND: Active learning (AL) has shown the promising potential to minimize the annotation cost while maximizing the performance in building statistical natural language processing (NLP) models. However, very few studies have investigated AL in a real-life setting in medical domain. METHODS: In this study, we developed the first AL-enabled annotation system for clinical named entity recognition (NER) with a novel AL algorithm. Besides the simulation study to evaluate the novel AL algorithm, we further conducted user studies with two nurses using this system to assess the performance of AL in real world annotation processes for building clinical NER models. RESULTS: The simulation results show that the novel AL algorithm outperformed traditional AL algorithm and random sampling. However, the user study tells a different story that AL methods did not always perform better than random sampling for different users. CONCLUSIONS: We found that the increased information content of actively selected sentences is strongly offset by the increased time required to annotate them. Moreover, the annotation time was not considered in the querying algorithms. Our future work includes developing better AL algorithms with the estimation of annotation time and evaluating the system with larger number of users. BioMed Central 2017-07-05 /pmc/articles/PMC5506567/ /pubmed/28699546 http://dx.doi.org/10.1186/s12911-017-0466-9 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Chen, Yukun Lask, Thomas A. Mei, Qiaozhu Chen, Qingxia Moon, Sungrim Wang, Jingqi Nguyen, Ky Dawodu, Tolulola Cohen, Trevor Denny, Joshua C. Xu, Hua An active learning-enabled annotation system for clinical named entity recognition |
title | An active learning-enabled annotation system for clinical named entity recognition |
title_full | An active learning-enabled annotation system for clinical named entity recognition |
title_fullStr | An active learning-enabled annotation system for clinical named entity recognition |
title_full_unstemmed | An active learning-enabled annotation system for clinical named entity recognition |
title_short | An active learning-enabled annotation system for clinical named entity recognition |
title_sort | active learning-enabled annotation system for clinical named entity recognition |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5506567/ https://www.ncbi.nlm.nih.gov/pubmed/28699546 http://dx.doi.org/10.1186/s12911-017-0466-9 |
work_keys_str_mv | AT chenyukun anactivelearningenabledannotationsystemforclinicalnamedentityrecognition AT laskthomasa anactivelearningenabledannotationsystemforclinicalnamedentityrecognition AT meiqiaozhu anactivelearningenabledannotationsystemforclinicalnamedentityrecognition AT chenqingxia anactivelearningenabledannotationsystemforclinicalnamedentityrecognition AT moonsungrim anactivelearningenabledannotationsystemforclinicalnamedentityrecognition AT wangjingqi anactivelearningenabledannotationsystemforclinicalnamedentityrecognition AT nguyenky anactivelearningenabledannotationsystemforclinicalnamedentityrecognition AT dawodutolulola anactivelearningenabledannotationsystemforclinicalnamedentityrecognition AT cohentrevor anactivelearningenabledannotationsystemforclinicalnamedentityrecognition AT dennyjoshuac anactivelearningenabledannotationsystemforclinicalnamedentityrecognition AT xuhua anactivelearningenabledannotationsystemforclinicalnamedentityrecognition AT chenyukun activelearningenabledannotationsystemforclinicalnamedentityrecognition AT laskthomasa activelearningenabledannotationsystemforclinicalnamedentityrecognition AT meiqiaozhu activelearningenabledannotationsystemforclinicalnamedentityrecognition AT chenqingxia activelearningenabledannotationsystemforclinicalnamedentityrecognition AT moonsungrim activelearningenabledannotationsystemforclinicalnamedentityrecognition AT wangjingqi activelearningenabledannotationsystemforclinicalnamedentityrecognition AT nguyenky activelearningenabledannotationsystemforclinicalnamedentityrecognition AT dawodutolulola activelearningenabledannotationsystemforclinicalnamedentityrecognition AT cohentrevor activelearningenabledannotationsystemforclinicalnamedentityrecognition AT dennyjoshuac activelearningenabledannotationsystemforclinicalnamedentityrecognition AT xuhua activelearningenabledannotationsystemforclinicalnamedentityrecognition |