Cargando…

102. Assessing ChatGPT Performance in the Brazilian Infectious Disease Specialist Certification Examination

BACKGROUND: Advances in artificial intelligence have the potential to impact medical fields, including the use of natural language processing-based models, such as ChatGPT. The ability of the ChatGPT to provide insightful responses across diverse fields of expertise could assist in medical decision-...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chaves Fernandes, Alexandre, Varela Cavalcanti Souto, Maria Eduarda, Felippe Jabour, Thais Barros, Luz, Kleber G, Pipolo Milan, Eveline
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Abstract
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10678998/ http://dx.doi.org/10.1093/ofid/ofad500.018

_version_	1785150489660227584
author	Chaves Fernandes, Alexandre Varela Cavalcanti Souto, Maria Eduarda Felippe Jabour, Thais Barros Luz, Kleber G Pipolo Milan, Eveline
author_facet	Chaves Fernandes, Alexandre Varela Cavalcanti Souto, Maria Eduarda Felippe Jabour, Thais Barros Luz, Kleber G Pipolo Milan, Eveline
author_sort	Chaves Fernandes, Alexandre
collection	PubMed
description	BACKGROUND: Advances in artificial intelligence have the potential to impact medical fields, including the use of natural language processing-based models, such as ChatGPT. The ability of the ChatGPT to provide insightful responses across diverse fields of expertise could assist in medical decision-making and knowledge management processes. ChatGPT has already demonstrated high accuracy in medical examinations such as the USMLE. To explore the potential of this tool in various contexts, our study aimed to evaluate the accuracy of the ChatGPT in the 2022 Brazilian Infectious Disease Specialist Certification Examination. METHODS: We conducted a test to evaluate the performance of GPT-3.5 and GPT-4 on the 2022 Brazilian Infectious Disease Specialist Certification Exam. A theoretical exam, consisting of 80 multiple-choice questions with five alternatives, was used to test performance. The GPT was given a command containing the question statement and alternatives, and a brief comment on the logic behind the answer was requested. Descriptive statistics were used to analyze the absolute performance of the correct answers in the ChatGPT-3.5 and ChatGPT-4 models. In addition, the degree of correlation between answers and performance throughout the test was estimated using Spearman's coefficient and a logistic regression curve, respectively. RESULTS: Of the 80 questions in the exam, four were excluded because they were invalidated in the final answer key. ChatGPT-3.5 had an accuracy of 53.95% (41/76), whereas ChatGPT-4 had an accuracy of 73.68% (56/76). Spearman's correlation coefficient between the two models was 0.585. There was a slight trend towards improvement in ChatGPT-4 performance throughout the test, as observed in the logistic regression curve. Comparison of Accuracy between ChatGPT 3.5 and ChatGPT4 [Figure: see text] The graph shows the percentage of accuracy for the two GPT models. The performance of ChatGPT-4 was superior to ChatGPT-3.5. Distribution of Correct and Incorrect Responses by ChatGPT-4 in Medical Test Questions [Figure: see text] The graph displays the distribution of responses generated by ChatGPT-4. The logistic regression curve shows a slight upward trend, indicating a slight improvement in performance as the questions were answered. CONCLUSION: ChatGPT-4 achieved performance above the 60% minimum threshold required for the certification exam. This indicates that it is a promising technology in various fields, including infectious diseases. However, its potential applications and associated ethical dilemmas must be thoroughly assessed. This advancement also highlights the need for medical education to concentrate on developing competence, skills, and critical thinking rather than relying solely on memorization DISCLOSURES: All Authors: No reported disclosures
format	Online Article Text
id	pubmed-10678998
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-106789982023-11-27 102. Assessing ChatGPT Performance in the Brazilian Infectious Disease Specialist Certification Examination Chaves Fernandes, Alexandre Varela Cavalcanti Souto, Maria Eduarda Felippe Jabour, Thais Barros Luz, Kleber G Pipolo Milan, Eveline Open Forum Infect Dis Abstract BACKGROUND: Advances in artificial intelligence have the potential to impact medical fields, including the use of natural language processing-based models, such as ChatGPT. The ability of the ChatGPT to provide insightful responses across diverse fields of expertise could assist in medical decision-making and knowledge management processes. ChatGPT has already demonstrated high accuracy in medical examinations such as the USMLE. To explore the potential of this tool in various contexts, our study aimed to evaluate the accuracy of the ChatGPT in the 2022 Brazilian Infectious Disease Specialist Certification Examination. METHODS: We conducted a test to evaluate the performance of GPT-3.5 and GPT-4 on the 2022 Brazilian Infectious Disease Specialist Certification Exam. A theoretical exam, consisting of 80 multiple-choice questions with five alternatives, was used to test performance. The GPT was given a command containing the question statement and alternatives, and a brief comment on the logic behind the answer was requested. Descriptive statistics were used to analyze the absolute performance of the correct answers in the ChatGPT-3.5 and ChatGPT-4 models. In addition, the degree of correlation between answers and performance throughout the test was estimated using Spearman's coefficient and a logistic regression curve, respectively. RESULTS: Of the 80 questions in the exam, four were excluded because they were invalidated in the final answer key. ChatGPT-3.5 had an accuracy of 53.95% (41/76), whereas ChatGPT-4 had an accuracy of 73.68% (56/76). Spearman's correlation coefficient between the two models was 0.585. There was a slight trend towards improvement in ChatGPT-4 performance throughout the test, as observed in the logistic regression curve. Comparison of Accuracy between ChatGPT 3.5 and ChatGPT4 [Figure: see text] The graph shows the percentage of accuracy for the two GPT models. The performance of ChatGPT-4 was superior to ChatGPT-3.5. Distribution of Correct and Incorrect Responses by ChatGPT-4 in Medical Test Questions [Figure: see text] The graph displays the distribution of responses generated by ChatGPT-4. The logistic regression curve shows a slight upward trend, indicating a slight improvement in performance as the questions were answered. CONCLUSION: ChatGPT-4 achieved performance above the 60% minimum threshold required for the certification exam. This indicates that it is a promising technology in various fields, including infectious diseases. However, its potential applications and associated ethical dilemmas must be thoroughly assessed. This advancement also highlights the need for medical education to concentrate on developing competence, skills, and critical thinking rather than relying solely on memorization DISCLOSURES: All Authors: No reported disclosures Oxford University Press 2023-11-27 /pmc/articles/PMC10678998/ http://dx.doi.org/10.1093/ofid/ofad500.018 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Infectious Diseases Society of America. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Abstract Chaves Fernandes, Alexandre Varela Cavalcanti Souto, Maria Eduarda Felippe Jabour, Thais Barros Luz, Kleber G Pipolo Milan, Eveline 102. Assessing ChatGPT Performance in the Brazilian Infectious Disease Specialist Certification Examination
title	102. Assessing ChatGPT Performance in the Brazilian Infectious Disease Specialist Certification Examination
title_full	102. Assessing ChatGPT Performance in the Brazilian Infectious Disease Specialist Certification Examination
title_fullStr	102. Assessing ChatGPT Performance in the Brazilian Infectious Disease Specialist Certification Examination
title_full_unstemmed	102. Assessing ChatGPT Performance in the Brazilian Infectious Disease Specialist Certification Examination
title_short	102. Assessing ChatGPT Performance in the Brazilian Infectious Disease Specialist Certification Examination
title_sort	102. assessing chatgpt performance in the brazilian infectious disease specialist certification examination
topic	Abstract
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10678998/ http://dx.doi.org/10.1093/ofid/ofad500.018
work_keys_str_mv	AT chavesfernandesalexandre 102assessingchatgptperformanceinthebrazilianinfectiousdiseasespecialistcertificationexamination AT varelacavalcantisoutomariaeduarda 102assessingchatgptperformanceinthebrazilianinfectiousdiseasespecialistcertificationexamination AT felippejabourthaisbarros 102assessingchatgptperformanceinthebrazilianinfectiousdiseasespecialistcertificationexamination AT luzkleberg 102assessingchatgptperformanceinthebrazilianinfectiousdiseasespecialistcertificationexamination AT pipolomilaneveline 102assessingchatgptperformanceinthebrazilianinfectiousdiseasespecialistcertificationexamination

102. Assessing ChatGPT Performance in the Brazilian Infectious Disease Specialist Certification Examination

Ejemplares similares