Abstract

Aim: This study aimed to evaluate the real-life performance of clinical vignettes and multiple-choice questions generated by using ChatGPT.

Methods: This was a randomized controlled study in an evidence-based medicine training program. We randomly assigned seventy-four medical students to two groups. The ChatGPT group received ill-defined cases generated by ChatGPT, while the control group received human-written cases. At the end of the training, they evaluated the cases by rating 10 statements using a Likert scale. They also answered 15 multiple-choice questions (MCQs) generated by ChatGPT. The case evaluations of the two groups were compared. Some psychometric characteristics (item difficulty and point-biserial correlations) of the test were also reported.

Results: None of the scores in 10 statements regarding the cases showed a significant difference between the ChatGPT group and the control group ( > .05). In the test, only six MCQs had acceptable levels (higher than 0.30) of point-biserial correlation, and five items could be considered acceptable in classroom settings.

Conclusions: The results showed that the quality of the vignettes are comparable to those created by human authors, and some multiple-questions have acceptable psychometric characteristics. ChatGPT has potential in generating clinical vignettes for teaching and MCQs for assessment in medical education.

Download full-text PDF

Link Source
Download Source 1https://www.tandfonline.com/doi/full/10.1080/0142159X.2024.2327477Web Search
Download Source 2http://dx.doi.org/10.1080/0142159X.2024.2327477DOI Listing

Publication Analysis

Top Keywords

clinical vignettes
12
multiple-choice questions
12
vignettes teaching
8
randomized controlled
8
chatgpt group
8
group received
8
generated chatgpt
8
control group
8
psychometric characteristics
8
chatgpt
6