Cargando…

Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation

OBJECTIVE: The aim of this study was to evaluate the performance of ChatGPT-4.0 in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and as a tool to provide feedback on the quality of the examination. METHODS: A total of two independent physicians entered...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gobira, Mauro, Nakayama, Luis Filipe, Moreira, Rodrigo, Andrade, Eric, Regatieri, Caio Vinicius Saito, Belfort, Rubens
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Associação Médica Brasileira 2023
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10547492/ https://www.ncbi.nlm.nih.gov/pubmed/37792871 http://dx.doi.org/10.1590/1806-9282.20230848

_version_	1785115070265556992
author	Gobira, Mauro Nakayama, Luis Filipe Moreira, Rodrigo Andrade, Eric Regatieri, Caio Vinicius Saito Belfort, Rubens
author_facet	Gobira, Mauro Nakayama, Luis Filipe Moreira, Rodrigo Andrade, Eric Regatieri, Caio Vinicius Saito Belfort, Rubens
author_sort	Gobira, Mauro
collection	PubMed
description	OBJECTIVE: The aim of this study was to evaluate the performance of ChatGPT-4.0 in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and as a tool to provide feedback on the quality of the examination. METHODS: A total of two independent physicians entered all examination questions into ChatGPT-4.0. After comparing the outputs with the test solutions, they classified the large language model answers as adequate, inadequate, or indeterminate. In cases of disagreement, they adjudicated and achieved a consensus decision on the ChatGPT accuracy. The performance across medical themes and nullified questions was compared using chi-square statistical analysis. RESULTS: In the Revalida examination, ChatGPT-4.0 answered 71 (87.7%) questions correctly and 10 (12.3%) incorrectly. There was no statistically significant difference in the proportions of correct answers among different medical themes (p=0.4886). The artificial intelligence model had a lower accuracy of 71.4% in nullified questions, with no statistical difference (p=0.241) between non-nullified and nullified groups. CONCLUSION: ChatGPT-4.0 showed satisfactory performance for the 2022 Brazilian National Examination for Medical Degree Revalidation. The large language model exhibited worse performance on subjective questions and public healthcare themes. The results of this study suggested that the overall quality of the Revalida examination questions is satisfactory and corroborates the nullified questions.
format	Online Article Text
id	pubmed-10547492
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Associação Médica Brasileira
record_format	MEDLINE/PubMed
spelling	pubmed-105474922023-10-04 Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation Gobira, Mauro Nakayama, Luis Filipe Moreira, Rodrigo Andrade, Eric Regatieri, Caio Vinicius Saito Belfort, Rubens Rev Assoc Med Bras (1992) Original Article OBJECTIVE: The aim of this study was to evaluate the performance of ChatGPT-4.0 in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and as a tool to provide feedback on the quality of the examination. METHODS: A total of two independent physicians entered all examination questions into ChatGPT-4.0. After comparing the outputs with the test solutions, they classified the large language model answers as adequate, inadequate, or indeterminate. In cases of disagreement, they adjudicated and achieved a consensus decision on the ChatGPT accuracy. The performance across medical themes and nullified questions was compared using chi-square statistical analysis. RESULTS: In the Revalida examination, ChatGPT-4.0 answered 71 (87.7%) questions correctly and 10 (12.3%) incorrectly. There was no statistically significant difference in the proportions of correct answers among different medical themes (p=0.4886). The artificial intelligence model had a lower accuracy of 71.4% in nullified questions, with no statistical difference (p=0.241) between non-nullified and nullified groups. CONCLUSION: ChatGPT-4.0 showed satisfactory performance for the 2022 Brazilian National Examination for Medical Degree Revalidation. The large language model exhibited worse performance on subjective questions and public healthcare themes. The results of this study suggested that the overall quality of the Revalida examination questions is satisfactory and corroborates the nullified questions. Associação Médica Brasileira 2023-09-25 /pmc/articles/PMC10547492/ /pubmed/37792871 http://dx.doi.org/10.1590/1806-9282.20230848 Text en https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Gobira, Mauro Nakayama, Luis Filipe Moreira, Rodrigo Andrade, Eric Regatieri, Caio Vinicius Saito Belfort, Rubens Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation
title	Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation
title_full	Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation
title_fullStr	Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation
title_full_unstemmed	Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation
title_short	Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation
title_sort	performance of chatgpt-4 in answering questions from the brazilian national examination for medical degree revalidation
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10547492/ https://www.ncbi.nlm.nih.gov/pubmed/37792871 http://dx.doi.org/10.1590/1806-9282.20230848
work_keys_str_mv	AT gobiramauro performanceofchatgpt4inansweringquestionsfromthebraziliannationalexaminationformedicaldegreerevalidation AT nakayamaluisfilipe performanceofchatgpt4inansweringquestionsfromthebraziliannationalexaminationformedicaldegreerevalidation AT moreirarodrigo performanceofchatgpt4inansweringquestionsfromthebraziliannationalexaminationformedicaldegreerevalidation AT andradeeric performanceofchatgpt4inansweringquestionsfromthebraziliannationalexaminationformedicaldegreerevalidation AT regatiericaioviniciussaito performanceofchatgpt4inansweringquestionsfromthebraziliannationalexaminationformedicaldegreerevalidation AT belfortrubens performanceofchatgpt4inansweringquestionsfromthebraziliannationalexaminationformedicaldegreerevalidation

Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation

Ejemplares similares