Cargando…

Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy

BACKGROUND: Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor’s activity for the purposes of quality assurance, safety, and continuing professional development. OBJECTIVE: The objective of the study was to evaluate the accuracy of machine learn...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gibbons, Chris, Richards, Suzanne, Valderas, Jose Maria, Campbell, John
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2017
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5371715/ https://www.ncbi.nlm.nih.gov/pubmed/28298265 http://dx.doi.org/10.2196/jmir.6533

_version_	1782518476458426368
author	Gibbons, Chris Richards, Suzanne Valderas, Jose Maria Campbell, John
author_facet	Gibbons, Chris Richards, Suzanne Valderas, Jose Maria Campbell, John
author_sort	Gibbons, Chris
collection	PubMed
description	BACKGROUND: Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor’s activity for the purposes of quality assurance, safety, and continuing professional development. OBJECTIVE: The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors’ professional performance in the United Kingdom. METHODS: We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians’ colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. RESULTS: Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to “popular” (recall=.97), “innovator” (recall=.98), and “respected” (recall=.87) codes and was lower for the “interpersonal” (recall=.80) and “professional” (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as “respected,” “professional,” and “interpersonal” related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or innovative and those who were not rated at all (P>.05). CONCLUSIONS: Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high performance. Colleague open-text comments that signal respect, professionalism, and being interpersonal may be key indicators of doctor’s performance.
format	Online Article Text
id	pubmed-5371715
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-53717152017-04-10 Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy Gibbons, Chris Richards, Suzanne Valderas, Jose Maria Campbell, John J Med Internet Res Original Paper BACKGROUND: Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor’s activity for the purposes of quality assurance, safety, and continuing professional development. OBJECTIVE: The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors’ professional performance in the United Kingdom. METHODS: We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians’ colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. RESULTS: Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to “popular” (recall=.97), “innovator” (recall=.98), and “respected” (recall=.87) codes and was lower for the “interpersonal” (recall=.80) and “professional” (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as “respected,” “professional,” and “interpersonal” related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or innovative and those who were not rated at all (P>.05). CONCLUSIONS: Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high performance. Colleague open-text comments that signal respect, professionalism, and being interpersonal may be key indicators of doctor’s performance. JMIR Publications 2017-03-15 /pmc/articles/PMC5371715/ /pubmed/28298265 http://dx.doi.org/10.2196/jmir.6533 Text en ©Chris Gibbons, Suzanne Richards, Jose Maria Valderas, John Campbell. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 15.03.2017. http://creativecommons.org/licenses/by/2.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Gibbons, Chris Richards, Suzanne Valderas, Jose Maria Campbell, John Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy
title	Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy
title_full	Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy
title_fullStr	Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy
title_full_unstemmed	Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy
title_short	Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy
title_sort	supervised machine learning algorithms can classify open-text feedback of doctor performance with human-level accuracy
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5371715/ https://www.ncbi.nlm.nih.gov/pubmed/28298265 http://dx.doi.org/10.2196/jmir.6533
work_keys_str_mv	AT gibbonschris supervisedmachinelearningalgorithmscanclassifyopentextfeedbackofdoctorperformancewithhumanlevelaccuracy AT richardssuzanne supervisedmachinelearningalgorithmscanclassifyopentextfeedbackofdoctorperformancewithhumanlevelaccuracy AT valderasjosemaria supervisedmachinelearningalgorithmscanclassifyopentextfeedbackofdoctorperformancewithhumanlevelaccuracy AT campbelljohn supervisedmachinelearningalgorithmscanclassifyopentextfeedbackofdoctorperformancewithhumanlevelaccuracy

Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy

Ejemplares similares