Cargando…

Identifying medical terms in patient-authored text: a crowdsourcing-based approach

BACKGROUND AND OBJECTIVE: As people increasingly engage in online health-seeking behavior and contribute to health-oriented websites, the volume of medical text authored by patients and other medical novices grows rapidly. However, we lack an effective method for automatically identifying medical te...

Descripción completa

Detalles Bibliográficos
Autores principales:	MacLean, Diana Lynn, Heer, Jeffrey
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BMJ Publishing Group 2013
Materias:	Focus on Patient Care
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3822103/ https://www.ncbi.nlm.nih.gov/pubmed/23645553 http://dx.doi.org/10.1136/amiajnl-2012-001110

_version_	1782290385536548864
author	MacLean, Diana Lynn Heer, Jeffrey
author_facet	MacLean, Diana Lynn Heer, Jeffrey
author_sort	MacLean, Diana Lynn
collection	PubMed
description	BACKGROUND AND OBJECTIVE: As people increasingly engage in online health-seeking behavior and contribute to health-oriented websites, the volume of medical text authored by patients and other medical novices grows rapidly. However, we lack an effective method for automatically identifying medical terms in patient-authored text (PAT). We demonstrate that crowdsourcing PAT medical term identification tasks to non-experts is a viable method for creating large, accurately-labeled PAT datasets; moreover, such datasets can be used to train classifiers that outperform existing medical term identification tools. MATERIALS AND METHODS: To evaluate the viability of using non-expert crowds to label PAT, we compare expert (registered nurses) and non-expert (Amazon Mechanical Turk workers; Turkers) responses to a PAT medical term identification task. Next, we build a crowd-labeled dataset comprising 10 000 sentences from MedHelp. We train two models on this dataset and evaluate their performance, as well as that of MetaMap, Open Biomedical Annotator (OBA), and NaCTeM's TerMINE, against two gold standard datasets: one from MedHelp and the other from CureTogether. RESULTS: When aggregated according to a corroborative voting policy, Turker responses predict expert responses with an F1 score of 84%. A conditional random field (CRF) trained on 10 000 crowd-labeled MedHelp sentences achieves an F1 score of 78% against the CureTogether gold standard, widely outperforming OBA (47%), TerMINE (43%), and MetaMap (39%). A failure analysis of the CRF suggests that misclassified terms are likely to be either generic or rare. CONCLUSIONS: Our results show that combining statistical models sensitive to sentence-level context with crowd-labeled data is a scalable and effective technique for automatically identifying medical terms in PAT.
format	Online Article Text
id	pubmed-3822103
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BMJ Publishing Group
record_format	MEDLINE/PubMed
spelling	pubmed-38221032013-12-11 Identifying medical terms in patient-authored text: a crowdsourcing-based approach MacLean, Diana Lynn Heer, Jeffrey J Am Med Inform Assoc Focus on Patient Care BACKGROUND AND OBJECTIVE: As people increasingly engage in online health-seeking behavior and contribute to health-oriented websites, the volume of medical text authored by patients and other medical novices grows rapidly. However, we lack an effective method for automatically identifying medical terms in patient-authored text (PAT). We demonstrate that crowdsourcing PAT medical term identification tasks to non-experts is a viable method for creating large, accurately-labeled PAT datasets; moreover, such datasets can be used to train classifiers that outperform existing medical term identification tools. MATERIALS AND METHODS: To evaluate the viability of using non-expert crowds to label PAT, we compare expert (registered nurses) and non-expert (Amazon Mechanical Turk workers; Turkers) responses to a PAT medical term identification task. Next, we build a crowd-labeled dataset comprising 10 000 sentences from MedHelp. We train two models on this dataset and evaluate their performance, as well as that of MetaMap, Open Biomedical Annotator (OBA), and NaCTeM's TerMINE, against two gold standard datasets: one from MedHelp and the other from CureTogether. RESULTS: When aggregated according to a corroborative voting policy, Turker responses predict expert responses with an F1 score of 84%. A conditional random field (CRF) trained on 10 000 crowd-labeled MedHelp sentences achieves an F1 score of 78% against the CureTogether gold standard, widely outperforming OBA (47%), TerMINE (43%), and MetaMap (39%). A failure analysis of the CRF suggests that misclassified terms are likely to be either generic or rare. CONCLUSIONS: Our results show that combining statistical models sensitive to sentence-level context with crowd-labeled data is a scalable and effective technique for automatically identifying medical terms in PAT. BMJ Publishing Group 2013-11 2013-05-05 /pmc/articles/PMC3822103/ /pubmed/23645553 http://dx.doi.org/10.1136/amiajnl-2012-001110 Text en Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/
spellingShingle	Focus on Patient Care MacLean, Diana Lynn Heer, Jeffrey Identifying medical terms in patient-authored text: a crowdsourcing-based approach
title	Identifying medical terms in patient-authored text: a crowdsourcing-based approach
title_full	Identifying medical terms in patient-authored text: a crowdsourcing-based approach
title_fullStr	Identifying medical terms in patient-authored text: a crowdsourcing-based approach
title_full_unstemmed	Identifying medical terms in patient-authored text: a crowdsourcing-based approach
title_short	Identifying medical terms in patient-authored text: a crowdsourcing-based approach
title_sort	identifying medical terms in patient-authored text: a crowdsourcing-based approach
topic	Focus on Patient Care
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3822103/ https://www.ncbi.nlm.nih.gov/pubmed/23645553 http://dx.doi.org/10.1136/amiajnl-2012-001110
work_keys_str_mv	AT macleandianalynn identifyingmedicaltermsinpatientauthoredtextacrowdsourcingbasedapproach AT heerjeffrey identifyingmedicaltermsinpatientauthoredtextacrowdsourcingbasedapproach

Identifying medical terms in patient-authored text: a crowdsourcing-based approach

Ejemplares similares