Cargando…

Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study

OBJECTIVES: Artificial intelligence companies have been increasing their initiatives recently to improve the results of chatbots, which are software programs that can converse with a human in natural language. The role of chatbots in health care is deemed worthy of research. OpenAI's ChatGPT is...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sarbay, İbrahim, Berikol, Göksu Bozdereli, Özturan, İbrahim Ulaş
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Wolters Kluwer - Medknow 2023
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10389099/ https://www.ncbi.nlm.nih.gov/pubmed/37529789 http://dx.doi.org/10.4103/tjem.tjem_79_23

_version_	1785082224226336768
author	Sarbay, İbrahim Berikol, Göksu Bozdereli Özturan, İbrahim Ulaş
author_facet	Sarbay, İbrahim Berikol, Göksu Bozdereli Özturan, İbrahim Ulaş
author_sort	Sarbay, İbrahim
collection	PubMed
description	OBJECTIVES: Artificial intelligence companies have been increasing their initiatives recently to improve the results of chatbots, which are software programs that can converse with a human in natural language. The role of chatbots in health care is deemed worthy of research. OpenAI's ChatGPT is a supervised and empowered machine learning-based chatbot. The aim of this study was to determine the performance of ChatGPT in emergency medicine (EM) triage prediction. METHODS: This was a preliminary, cross-sectional study conducted with case scenarios generated by the researchers based on the emergency severity index (ESI) handbook v4 cases. Two independent EM specialists who were experts in the ESI triage scale determined the triage categories for each case. A third independent EM specialist was consulted as arbiter, if necessary. Consensus results for each case scenario were assumed as the reference triage category. Subsequently, each case scenario was queried with ChatGPT and the answer was recorded as the index triage category. Inconsistent classifications between the ChatGPT and reference category were defined as over-triage (false positive) or under-triage (false negative). RESULTS: Fifty case scenarios were assessed in the study. Reliability analysis showed a fair agreement between EM specialists and ChatGPT (Cohen's Kappa: 0.341). Eleven cases (22%) were over triaged and 9 (18%) cases were under triaged by ChatGPT. In 9 cases (18%), ChatGPT reported two consecutive triage categories, one of which matched the expert consensus. It had an overall sensitivity of 57.1% (95% confidence interval [CI]: 34–78.2), specificity of 34.5% (95% CI: 17.9–54.3), positive predictive value (PPV) of 38.7% (95% CI: 21.8–57.8), negative predictive value (NPV) of 52.6 (95% CI: 28.9–75.6), and an F1 score of 0.461. In high acuity cases (ESI-1 and ESI-2), ChatGPT showed a sensitivity of 76.2% (95% CI: 52.8–91.8), specificity of 93.1% (95% CI: 77.2–99.2), PPV of 88.9% (95% CI: 65.3–98.6), NPV of 84.4 (95% CI: 67.2–94.7), and an F1 score of 0.821. The receiver operating characteristic curve showed an area under the curve of 0.846 (95% CI: 0.724–0.969, P < 0.001) for high acuity cases. CONCLUSION: The performance of ChatGPT was best when predicting high acuity cases (ESI-1 and ESI-2). It may be useful when determining the cases requiring critical care. When trained with more medical knowledge, ChatGPT may be more accurate for other triage category predictions.
format	Online Article Text
id	pubmed-10389099
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Wolters Kluwer - Medknow
record_format	MEDLINE/PubMed
spelling	pubmed-103890992023-08-01 Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study Sarbay, İbrahim Berikol, Göksu Bozdereli Özturan, İbrahim Ulaş Turk J Emerg Med Original Article OBJECTIVES: Artificial intelligence companies have been increasing their initiatives recently to improve the results of chatbots, which are software programs that can converse with a human in natural language. The role of chatbots in health care is deemed worthy of research. OpenAI's ChatGPT is a supervised and empowered machine learning-based chatbot. The aim of this study was to determine the performance of ChatGPT in emergency medicine (EM) triage prediction. METHODS: This was a preliminary, cross-sectional study conducted with case scenarios generated by the researchers based on the emergency severity index (ESI) handbook v4 cases. Two independent EM specialists who were experts in the ESI triage scale determined the triage categories for each case. A third independent EM specialist was consulted as arbiter, if necessary. Consensus results for each case scenario were assumed as the reference triage category. Subsequently, each case scenario was queried with ChatGPT and the answer was recorded as the index triage category. Inconsistent classifications between the ChatGPT and reference category were defined as over-triage (false positive) or under-triage (false negative). RESULTS: Fifty case scenarios were assessed in the study. Reliability analysis showed a fair agreement between EM specialists and ChatGPT (Cohen's Kappa: 0.341). Eleven cases (22%) were over triaged and 9 (18%) cases were under triaged by ChatGPT. In 9 cases (18%), ChatGPT reported two consecutive triage categories, one of which matched the expert consensus. It had an overall sensitivity of 57.1% (95% confidence interval [CI]: 34–78.2), specificity of 34.5% (95% CI: 17.9–54.3), positive predictive value (PPV) of 38.7% (95% CI: 21.8–57.8), negative predictive value (NPV) of 52.6 (95% CI: 28.9–75.6), and an F1 score of 0.461. In high acuity cases (ESI-1 and ESI-2), ChatGPT showed a sensitivity of 76.2% (95% CI: 52.8–91.8), specificity of 93.1% (95% CI: 77.2–99.2), PPV of 88.9% (95% CI: 65.3–98.6), NPV of 84.4 (95% CI: 67.2–94.7), and an F1 score of 0.821. The receiver operating characteristic curve showed an area under the curve of 0.846 (95% CI: 0.724–0.969, P < 0.001) for high acuity cases. CONCLUSION: The performance of ChatGPT was best when predicting high acuity cases (ESI-1 and ESI-2). It may be useful when determining the cases requiring critical care. When trained with more medical knowledge, ChatGPT may be more accurate for other triage category predictions. Wolters Kluwer - Medknow 2023-06-26 /pmc/articles/PMC10389099/ /pubmed/37529789 http://dx.doi.org/10.4103/tjem.tjem_79_23 Text en Copyright: © 2023 Turkish Journal of Emergency Medicine https://creativecommons.org/licenses/by-nc-sa/4.0/This is an open access journal, and articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as appropriate credit is given and the new creations are licensed under the identical terms.
spellingShingle	Original Article Sarbay, İbrahim Berikol, Göksu Bozdereli Özturan, İbrahim Ulaş Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study
title	Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study
title_full	Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study
title_fullStr	Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study
title_full_unstemmed	Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study
title_short	Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study
title_sort	performance of emergency triage prediction of an open access natural language processing based chatbot application (chatgpt): a preliminary, scenario-based cross-sectional study
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10389099/ https://www.ncbi.nlm.nih.gov/pubmed/37529789 http://dx.doi.org/10.4103/tjem.tjem_79_23
work_keys_str_mv	AT sarbayibrahim performanceofemergencytriagepredictionofanopenaccessnaturallanguageprocessingbasedchatbotapplicationchatgptapreliminaryscenariobasedcrosssectionalstudy AT berikolgoksubozdereli performanceofemergencytriagepredictionofanopenaccessnaturallanguageprocessingbasedchatbotapplicationchatgptapreliminaryscenariobasedcrosssectionalstudy AT ozturanibrahimulas performanceofemergencytriagepredictionofanopenaccessnaturallanguageprocessingbasedchatbotapplicationchatgptapreliminaryscenariobasedcrosssectionalstudy

Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study

Ejemplares similares