Cargando…

Harnessing Natural Language Processing to Support Decisions Around Workplace-Based Assessment: Machine Learning Study of Competency-Based Medical Education

BACKGROUND: Residents receive a numeric performance rating (eg, 1-7 scoring scale) along with a narrative (ie, qualitative) feedback based on their performance in each workplace-based assessment (WBA). Aggregated qualitative data from WBA can be overwhelming to process and fairly adjudicate as part...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yilmaz, Yusuf, Jurado Nunez, Alma, Ariaeinejad, Ali, Lee, Mark, Sherbino, Jonathan, Chan, Teresa M
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2022
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9187970/ https://www.ncbi.nlm.nih.gov/pubmed/35622398 http://dx.doi.org/10.2196/30537

_version_	1784725272960958464
author	Yilmaz, Yusuf Jurado Nunez, Alma Ariaeinejad, Ali Lee, Mark Sherbino, Jonathan Chan, Teresa M
author_facet	Yilmaz, Yusuf Jurado Nunez, Alma Ariaeinejad, Ali Lee, Mark Sherbino, Jonathan Chan, Teresa M
author_sort	Yilmaz, Yusuf
collection	PubMed
description	BACKGROUND: Residents receive a numeric performance rating (eg, 1-7 scoring scale) along with a narrative (ie, qualitative) feedback based on their performance in each workplace-based assessment (WBA). Aggregated qualitative data from WBA can be overwhelming to process and fairly adjudicate as part of a global decision about learner competence. Current approaches with qualitative data require a human rater to maintain attention and appropriately weigh various data inputs within the constraints of working memory before rendering a global judgment of performance. OBJECTIVE: This study explores natural language processing (NLP) and machine learning (ML) applications for identifying trainees at risk using a large WBA narrative comment data set associated with numerical ratings. METHODS: NLP was performed retrospectively on a complete data set of narrative comments (ie, text-based feedback to residents based on their performance on a task) derived from WBAs completed by faculty members from multiple hospitals associated with a single, large, residency program at McMaster University, Canada. Narrative comments were vectorized to quantitative ratings using the bag-of-n-grams technique with 3 input types: unigram, bigrams, and trigrams. Supervised ML models using linear regression were trained with the quantitative ratings, performed binary classification, and output a prediction of whether a resident fell into the category of at risk or not at risk. Sensitivity, specificity, and accuracy metrics are reported. RESULTS: The database comprised 7199 unique direct observation assessments, containing both narrative comments and a rating between 3 and 7 in imbalanced distribution (scores 3-5: 726 ratings; and scores 6-7: 4871 ratings). A total of 141 unique raters from 5 different hospitals and 45 unique residents participated over the course of 5 academic years. When comparing the 3 different input types for diagnosing if a trainee would be rated low (ie, 1-5) or high (ie, 6 or 7), our accuracy for trigrams was 87%, bigrams 86%, and unigrams 82%. We also found that all 3 input types had better prediction accuracy when using a bimodal cut (eg, lower or higher) compared with predicting performance along the full 7-point rating scale (50%-52%). CONCLUSIONS: The ML models can accurately identify underperforming residents via narrative comments provided for WBAs. The words generated in WBAs can be a worthy data set to augment human decisions for educators tasked with processing large volumes of narrative assessments.
format	Online Article Text
id	pubmed-9187970
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-91879702022-06-12 Harnessing Natural Language Processing to Support Decisions Around Workplace-Based Assessment: Machine Learning Study of Competency-Based Medical Education Yilmaz, Yusuf Jurado Nunez, Alma Ariaeinejad, Ali Lee, Mark Sherbino, Jonathan Chan, Teresa M JMIR Med Educ Original Paper BACKGROUND: Residents receive a numeric performance rating (eg, 1-7 scoring scale) along with a narrative (ie, qualitative) feedback based on their performance in each workplace-based assessment (WBA). Aggregated qualitative data from WBA can be overwhelming to process and fairly adjudicate as part of a global decision about learner competence. Current approaches with qualitative data require a human rater to maintain attention and appropriately weigh various data inputs within the constraints of working memory before rendering a global judgment of performance. OBJECTIVE: This study explores natural language processing (NLP) and machine learning (ML) applications for identifying trainees at risk using a large WBA narrative comment data set associated with numerical ratings. METHODS: NLP was performed retrospectively on a complete data set of narrative comments (ie, text-based feedback to residents based on their performance on a task) derived from WBAs completed by faculty members from multiple hospitals associated with a single, large, residency program at McMaster University, Canada. Narrative comments were vectorized to quantitative ratings using the bag-of-n-grams technique with 3 input types: unigram, bigrams, and trigrams. Supervised ML models using linear regression were trained with the quantitative ratings, performed binary classification, and output a prediction of whether a resident fell into the category of at risk or not at risk. Sensitivity, specificity, and accuracy metrics are reported. RESULTS: The database comprised 7199 unique direct observation assessments, containing both narrative comments and a rating between 3 and 7 in imbalanced distribution (scores 3-5: 726 ratings; and scores 6-7: 4871 ratings). A total of 141 unique raters from 5 different hospitals and 45 unique residents participated over the course of 5 academic years. When comparing the 3 different input types for diagnosing if a trainee would be rated low (ie, 1-5) or high (ie, 6 or 7), our accuracy for trigrams was 87%, bigrams 86%, and unigrams 82%. We also found that all 3 input types had better prediction accuracy when using a bimodal cut (eg, lower or higher) compared with predicting performance along the full 7-point rating scale (50%-52%). CONCLUSIONS: The ML models can accurately identify underperforming residents via narrative comments provided for WBAs. The words generated in WBAs can be a worthy data set to augment human decisions for educators tasked with processing large volumes of narrative assessments. JMIR Publications 2022-05-27 /pmc/articles/PMC9187970/ /pubmed/35622398 http://dx.doi.org/10.2196/30537 Text en ©Yusuf Yilmaz, Alma Jurado Nunez, Ali Ariaeinejad, Mark Lee, Jonathan Sherbino, Teresa M Chan. Originally published in JMIR Medical Education (https://mededu.jmir.org), 27.05.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Education, is properly cited. The complete bibliographic information, a link to the original publication on https://mededu.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Yilmaz, Yusuf Jurado Nunez, Alma Ariaeinejad, Ali Lee, Mark Sherbino, Jonathan Chan, Teresa M Harnessing Natural Language Processing to Support Decisions Around Workplace-Based Assessment: Machine Learning Study of Competency-Based Medical Education
title	Harnessing Natural Language Processing to Support Decisions Around Workplace-Based Assessment: Machine Learning Study of Competency-Based Medical Education
title_full	Harnessing Natural Language Processing to Support Decisions Around Workplace-Based Assessment: Machine Learning Study of Competency-Based Medical Education
title_fullStr	Harnessing Natural Language Processing to Support Decisions Around Workplace-Based Assessment: Machine Learning Study of Competency-Based Medical Education
title_full_unstemmed	Harnessing Natural Language Processing to Support Decisions Around Workplace-Based Assessment: Machine Learning Study of Competency-Based Medical Education
title_short	Harnessing Natural Language Processing to Support Decisions Around Workplace-Based Assessment: Machine Learning Study of Competency-Based Medical Education
title_sort	harnessing natural language processing to support decisions around workplace-based assessment: machine learning study of competency-based medical education
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9187970/ https://www.ncbi.nlm.nih.gov/pubmed/35622398 http://dx.doi.org/10.2196/30537
work_keys_str_mv	AT yilmazyusuf harnessingnaturallanguageprocessingtosupportdecisionsaroundworkplacebasedassessmentmachinelearningstudyofcompetencybasedmedicaleducation AT juradonunezalma harnessingnaturallanguageprocessingtosupportdecisionsaroundworkplacebasedassessmentmachinelearningstudyofcompetencybasedmedicaleducation AT ariaeinejadali harnessingnaturallanguageprocessingtosupportdecisionsaroundworkplacebasedassessmentmachinelearningstudyofcompetencybasedmedicaleducation AT leemark harnessingnaturallanguageprocessingtosupportdecisionsaroundworkplacebasedassessmentmachinelearningstudyofcompetencybasedmedicaleducation AT sherbinojonathan harnessingnaturallanguageprocessingtosupportdecisionsaroundworkplacebasedassessmentmachinelearningstudyofcompetencybasedmedicaleducation AT chanteresam harnessingnaturallanguageprocessingtosupportdecisionsaroundworkplacebasedassessmentmachinelearningstudyofcompetencybasedmedicaleducation

Harnessing Natural Language Processing to Support Decisions Around Workplace-Based Assessment: Machine Learning Study of Competency-Based Medical Education

Ejemplares similares