Abstract
Aim: This study aimed to evaluate the real-life performance of clinical vignettes and multiple-choice questions generated by using ChatGPT.
Methods: This was a randomized controlled study in an evidence-based medicine training program. We randomly assigned seventy-four medical students to two groups. The ChatGPT group received ill-defined cases generated by ChatGPT, while the control group received human-written cases. At the end of the training, they evaluated the cases by rating 10 statements using a Likert scale. They also answered 15 multiple-choice questions (MCQs) generated by ChatGPT. The case evaluations of the two groups were compared. Some psychometric characteristics (item difficulty and point-biserial correlations) of the test were also reported.
Results: None of the scores in 10 statements regarding the cases showed a significant difference between the ChatGPT group and the control group ( > .05). In the test, only six MCQs had acceptable levels (higher than 0.30) of point-biserial correlation, and five items could be considered acceptable in classroom settings.
Conclusions: The results showed that the quality of the vignettes are comparable to those created by human authors, and some multiple-questions have acceptable psychometric characteristics. ChatGPT has potential in generating clinical vignettes for teaching and MCQs for assessment in medical education.
Download full-text PDF |
Link | Source |
|---|---|---|
| Download Source 1 | https://www.tandfonline.com/doi/full/10.1080/0142159X.2024.2327477 | Web Search |
| Download Source 2 | http://dx.doi.org/10.1080/0142159X.2024.2327477 | DOI Listing |