Cargando…

Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study

PURPOSE: This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems. METHODS:...

Descripción completa

Detalles Bibliográficos
Autores principales: Ignjatović, Aleksandra, Stevanović, Lazar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Korea Health Personnel Licensing Examination Institute 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646144/
https://www.ncbi.nlm.nih.gov/pubmed/37840252
http://dx.doi.org/10.3352/jeehp.2023.20.28
_version_ 1785134832318152704
author Ignjatović, Aleksandra
Stevanović, Lazar
author_facet Ignjatović, Aleksandra
Stevanović, Lazar
author_sort Ignjatović, Aleksandra
collection PubMed
description PURPOSE: This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems. METHODS: ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4). RESULTS: GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring. CONCLUSION: The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.
format Online
Article
Text
id pubmed-10646144
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Korea Health Personnel Licensing Examination Institute
record_format MEDLINE/PubMed
spelling pubmed-106461442023-10-16 Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study Ignjatović, Aleksandra Stevanović, Lazar J Educ Eval Health Prof Research Article PURPOSE: This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems. METHODS: ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4). RESULTS: GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring. CONCLUSION: The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education. Korea Health Personnel Licensing Examination Institute 2023-10-16 /pmc/articles/PMC10646144/ /pubmed/37840252 http://dx.doi.org/10.3352/jeehp.2023.20.28 Text en © 2023 Korea Health Personnel Licensing Examination Institute https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Ignjatović, Aleksandra
Stevanović, Lazar
Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study
title Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study
title_full Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study
title_fullStr Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study
title_full_unstemmed Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study
title_short Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study
title_sort efficacy and limitations of chatgpt as a biostatistical problem-solving tool in medical education in serbia: a descriptive study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646144/
https://www.ncbi.nlm.nih.gov/pubmed/37840252
http://dx.doi.org/10.3352/jeehp.2023.20.28
work_keys_str_mv AT ignjatovicaleksandra efficacyandlimitationsofchatgptasabiostatisticalproblemsolvingtoolinmedicaleducationinserbiaadescriptivestudy
AT stevanoviclazar efficacyandlimitationsofchatgptasabiostatisticalproblemsolvingtoolinmedicaleducationinserbiaadescriptivestudy