Cargando…
Assessing ChatGPT’s capacity for clinical decision support in pediatrics: A comparative study with pediatricians using KIDMAP of Rasch analysis
The application of large language models in clinical decision support (CDS) is an area that warrants further investigation. ChatGPT, a prominent large language models developed by OpenAI, has shown promising performance across various domains. However, there is limited research evaluating its use sp...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Lippincott Williams & Wilkins
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10289633/ https://www.ncbi.nlm.nih.gov/pubmed/37352054 http://dx.doi.org/10.1097/MD.0000000000034068 |
Sumario: | The application of large language models in clinical decision support (CDS) is an area that warrants further investigation. ChatGPT, a prominent large language models developed by OpenAI, has shown promising performance across various domains. However, there is limited research evaluating its use specifically in pediatric clinical decision-making. This study aimed to assess ChatGPT’s potential as a CDS tool in pediatrics by evCDSaluating its performance on 8 common clinical symptom prompts. Study objectives were to answer the 2 research questions: the ChatGPT’s overall grade in a range from A (high) to E (low) compared to a normal sample and the difference in assessment of ChatGPT between 2 pediatricians. METHODS: We compared ChatGPT’s responses to 8 items related to clinical symptoms commonly encountered by pediatricians. Two pediatricians independently assessed the answers provided by ChatGPT in an open-ended format. The scoring system ranged from 0 to 100, which was then transformed into 5 ordinal categories. We simulated 300 virtual students with a normal distribution to provide scores on items based on Rasch rating scale model and their difficulties in a range between −2 to 2.5 logits. Two visual presentations (Wright map and KIDMAP) were generated to answer the 2 research questions outlined in the objectives of the study. RESULTS: The 2 pediatricians’ assessments indicated that ChatGPT’s overall performance corresponded to a grade of C in a range from A to E, with average scores of −0.89 logits and 0.90 logits (=log odds), respectively. The assessments revealed a significant difference in performance between the 2 pediatricians (P < .05), with scores of −0.89 (SE = 0.37) and 0.90 (SE = 0.41) in log odds units (logits in Rasch analysis). CONCLUSION: This study demonstrates the feasibility of utilizing ChatGPT as a CDS tool for patients presenting with common pediatric symptoms. The findings suggest that ChatGPT has the potential to enhance clinical workflow and aid in responsible clinical decision-making. Further exploration and refinement of ChatGPT’s capabilities in pediatric care can potentially contribute to improved healthcare outcomes and patient management. |
---|