Cargando…

Assessing ChatGPT’s capacity for clinical decision support in pediatrics: A comparative study with pediatricians using KIDMAP of Rasch analysis

The application of large language models in clinical decision support (CDS) is an area that warrants further investigation. ChatGPT, a prominent large language models developed by OpenAI, has shown promising performance across various domains. However, there is limited research evaluating its use sp...

Descripción completa

Detalles Bibliográficos
Autores principales: Kao, Hsu-Ju, Chien, Tsair-Wei, Wang, Wen-Chung, Chou, Willy, Chow, Julie Chi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Lippincott Williams & Wilkins 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10289633/
https://www.ncbi.nlm.nih.gov/pubmed/37352054
http://dx.doi.org/10.1097/MD.0000000000034068
Descripción
Sumario:The application of large language models in clinical decision support (CDS) is an area that warrants further investigation. ChatGPT, a prominent large language models developed by OpenAI, has shown promising performance across various domains. However, there is limited research evaluating its use specifically in pediatric clinical decision-making. This study aimed to assess ChatGPT’s potential as a CDS tool in pediatrics by evCDSaluating its performance on 8 common clinical symptom prompts. Study objectives were to answer the 2 research questions: the ChatGPT’s overall grade in a range from A (high) to E (low) compared to a normal sample and the difference in assessment of ChatGPT between 2 pediatricians. METHODS: We compared ChatGPT’s responses to 8 items related to clinical symptoms commonly encountered by pediatricians. Two pediatricians independently assessed the answers provided by ChatGPT in an open-ended format. The scoring system ranged from 0 to 100, which was then transformed into 5 ordinal categories. We simulated 300 virtual students with a normal distribution to provide scores on items based on Rasch rating scale model and their difficulties in a range between −2 to 2.5 logits. Two visual presentations (Wright map and KIDMAP) were generated to answer the 2 research questions outlined in the objectives of the study. RESULTS: The 2 pediatricians’ assessments indicated that ChatGPT’s overall performance corresponded to a grade of C in a range from A to E, with average scores of −0.89 logits and 0.90 logits (=log odds), respectively. The assessments revealed a significant difference in performance between the 2 pediatricians (P < .05), with scores of −0.89 (SE = 0.37) and 0.90 (SE = 0.41) in log odds units (logits in Rasch analysis). CONCLUSION: This study demonstrates the feasibility of utilizing ChatGPT as a CDS tool for patients presenting with common pediatric symptoms. The findings suggest that ChatGPT has the potential to enhance clinical workflow and aid in responsible clinical decision-making. Further exploration and refinement of ChatGPT’s capabilities in pediatric care can potentially contribute to improved healthcare outcomes and patient management.