Cargando…
Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing
BACKGROUND: A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications Inc.
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3636329/ https://www.ncbi.nlm.nih.gov/pubmed/23548263 http://dx.doi.org/10.2196/jmir.2426 |
_version_ | 1782267318520250368 |
---|---|
author | Zhai, Haijun Lingren, Todd Deleger, Louise Li, Qi Kaiser, Megan Stoutenborough, Laura Solti, Imre |
author_facet | Zhai, Haijun Lingren, Todd Deleger, Louise Li, Qi Kaiser, Megan Stoutenborough, Laura Solti, Imre |
author_sort | Zhai, Haijun |
collection | PubMed |
description | BACKGROUND: A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora. OBJECTIVE: Building upon previous work from the general crowdsourcing research, this study investigated the usability of crowdsourcing in the clinical NLP domain with special emphasis on achieving high agreement between crowdsourced and traditionally-developed corpora. METHODS: To build the gold standard for evaluating the crowdsourcing workers’ performance, 1042 clinical trial announcements (CTAs) from the ClinicalTrials.gov website were randomly selected and double annotated for medication names, medication types, and linked attributes. For the experiments, we used CrowdFlower, an Amazon Mechanical Turk-based crowdsourcing platform. We calculated sensitivity, precision, and F-measure to evaluate the quality of the crowd’s work and tested the statistical significance (P<.001, chi-square test) to detect differences between the crowdsourced and traditionally-developed annotations. RESULTS: The agreement between the crowd’s annotations and the traditionally-generated corpora was high for: (1) annotations (0.87, F-measure for medication names; 0.73, medication types), (2) correction of previous annotations (0.90, medication names; 0.76, medication types), and excellent for (3) linking medications with their attributes (0.96). Simple voting provided the best judgment aggregation approach. There was no statistically significant difference between the crowd and traditionally-generated corpora. Our results showed a 27.9% improvement over previously reported results on medication named entity annotation task. CONCLUSIONS: This study offers three contributions. First, we proved that crowdsourcing is a feasible, inexpensive, fast, and practical approach to collect high-quality annotations for clinical text (when protected health information was excluded). We believe that well-designed user interfaces and rigorous quality control strategy for entity annotation and linking were critical to the success of this work. Second, as a further contribution to the Internet-based crowdsourcing field, we will publicly release the JavaScript and CrowdFlower Markup Language infrastructure code that is necessary to utilize CrowdFlower’s quality control and crowdsourcing interfaces for named entity annotations. Finally, to spur future research, we will release the CTA annotations that were generated by traditional and crowdsourced approaches. |
format | Online Article Text |
id | pubmed-3636329 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | JMIR Publications Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-36363292013-04-26 Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing Zhai, Haijun Lingren, Todd Deleger, Louise Li, Qi Kaiser, Megan Stoutenborough, Laura Solti, Imre J Med Internet Res Original Paper BACKGROUND: A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora. OBJECTIVE: Building upon previous work from the general crowdsourcing research, this study investigated the usability of crowdsourcing in the clinical NLP domain with special emphasis on achieving high agreement between crowdsourced and traditionally-developed corpora. METHODS: To build the gold standard for evaluating the crowdsourcing workers’ performance, 1042 clinical trial announcements (CTAs) from the ClinicalTrials.gov website were randomly selected and double annotated for medication names, medication types, and linked attributes. For the experiments, we used CrowdFlower, an Amazon Mechanical Turk-based crowdsourcing platform. We calculated sensitivity, precision, and F-measure to evaluate the quality of the crowd’s work and tested the statistical significance (P<.001, chi-square test) to detect differences between the crowdsourced and traditionally-developed annotations. RESULTS: The agreement between the crowd’s annotations and the traditionally-generated corpora was high for: (1) annotations (0.87, F-measure for medication names; 0.73, medication types), (2) correction of previous annotations (0.90, medication names; 0.76, medication types), and excellent for (3) linking medications with their attributes (0.96). Simple voting provided the best judgment aggregation approach. There was no statistically significant difference between the crowd and traditionally-generated corpora. Our results showed a 27.9% improvement over previously reported results on medication named entity annotation task. CONCLUSIONS: This study offers three contributions. First, we proved that crowdsourcing is a feasible, inexpensive, fast, and practical approach to collect high-quality annotations for clinical text (when protected health information was excluded). We believe that well-designed user interfaces and rigorous quality control strategy for entity annotation and linking were critical to the success of this work. Second, as a further contribution to the Internet-based crowdsourcing field, we will publicly release the JavaScript and CrowdFlower Markup Language infrastructure code that is necessary to utilize CrowdFlower’s quality control and crowdsourcing interfaces for named entity annotations. Finally, to spur future research, we will release the CTA annotations that were generated by traditional and crowdsourced approaches. JMIR Publications Inc. 2013-04-02 /pmc/articles/PMC3636329/ /pubmed/23548263 http://dx.doi.org/10.2196/jmir.2426 Text en ©Haijun Zhai, Todd Lingren, Louise Deleger, Qi Li, Megan Kaiser, Laura Stoutenborough, Imre Solti. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 02.04.2013. http://creativecommons.org/licenses/by/2.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Zhai, Haijun Lingren, Todd Deleger, Louise Li, Qi Kaiser, Megan Stoutenborough, Laura Solti, Imre Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing |
title | Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing |
title_full | Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing |
title_fullStr | Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing |
title_full_unstemmed | Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing |
title_short | Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing |
title_sort | web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3636329/ https://www.ncbi.nlm.nih.gov/pubmed/23548263 http://dx.doi.org/10.2196/jmir.2426 |
work_keys_str_mv | AT zhaihaijun web20basedcrowdsourcingforhighqualitygoldstandarddevelopmentinclinicalnaturallanguageprocessing AT lingrentodd web20basedcrowdsourcingforhighqualitygoldstandarddevelopmentinclinicalnaturallanguageprocessing AT delegerlouise web20basedcrowdsourcingforhighqualitygoldstandarddevelopmentinclinicalnaturallanguageprocessing AT liqi web20basedcrowdsourcingforhighqualitygoldstandarddevelopmentinclinicalnaturallanguageprocessing AT kaisermegan web20basedcrowdsourcingforhighqualitygoldstandarddevelopmentinclinicalnaturallanguageprocessing AT stoutenboroughlaura web20basedcrowdsourcingforhighqualitygoldstandarddevelopmentinclinicalnaturallanguageprocessing AT soltiimre web20basedcrowdsourcingforhighqualitygoldstandarddevelopmentinclinicalnaturallanguageprocessing |