
BMC Oral Health, Год журнала: 2025, Номер 25(1)
Опубликована: Апрель 15, 2025
Artificial intelligence (AI) has rapidly advanced in healthcare and dental education, significantly impacting diagnostic processes, treatment planning, academic training. The aim of this study is to evaluate the performance differences between different large language models (LLMs) by analyzing their accuracy rates answers multiple choice oral pathology questions. This evaluates eight LLMs (Gemini 1.5, Gemini 2, ChatGPT 4o, 4, o1, Copilot, Claude 3.5, Deepseek) answering multiple-choice questions from Turkish Dental Specialization Examination (DUS). A total 100 2012 2021 were analyzed. Questions classified as "case-based" or "knowledge-based". responses "correct" "incorrect" based on official answer keys. To prevent learning biases, no follow-up feedback provided after LLMs' responses. Significant observed among (p < 0.001). o1 achieved highest (96 correct, 4 incorrect), followed (84 correct), 2 Deepseek (82 correct each). Copilot had lowest (61 correct). Case-based showed notable variations = 0.034), where excelled. For knowledge-based questions, demonstrated Post-hoc analysis revealed that performed better than most other across both case-based 0.0031). variable proficiency with showing higher accuracy. shows promise a supplementary educational tool, though further validation required.
Язык: Английский