Evolution of AI in Anatomy Education: Comparing Current Large Language Models Against Historical ChatGPT Performance on USMLE-Style Questions DOI Creative Commons
Olena Bolgova, Volodymyr Mavrych

Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown

Published: March 24, 2025

Abstract Background The integration of Large Language Models (LLMs) in medical education has gained significant attention, particularly their ability to handle complex knowledge assessments. However, comprehensive evaluation performance anatomical remains limited. To evaluate the accuracy current LLMs compared previous versions answering multiple-choice questions and assessing reliability across different topics. Methods We analyzed four (GPT-4o, Claude, Copilot, Gemini) on 325 USMLE-style MCQs covering seven Each model attempted three times. Results were with year's GPT-3.5 random guessing. Statistical analysis included chi-square tests for differences. Results Current achieved an average 76.8 ± 12.2%, significantly higher than (44.4 8.5%) responses (19.4 5.9%). GPT-4o demonstrated highest (92.9 2.5%), followed by Claude (76.7 5.7%), Copilot (73.9 11.9%), Gemini (63.7 6.5%). Performance varied topics, Head & Neck (79.5%) Abdomen (78.7%) showing rates, while Upper Limb showed lowest (72.9%). Only 29.5% answered correctly all LLMs, 2.5% never correctly. confirmed differences between models topics (χ² = 182.11–518.32, p < 0.001). Conclusions show markedly improved assessment versions, demonstrating superior consistency. variations suggest need careful consideration educational applications. These tools promise as supplementary resources highlighting continued necessity human expertise. Clinical trial number Not applicable.

Language: Английский

Technology-enhanced learning in medical education in the age of artificial intelligence DOI
Kyong‐Jee Kim

Forum for education studies., Journal Year: 2025, Volume and Issue: 3(2), P. 2730 - 2730

Published: April 1, 2025

This paper explores the transformative role of artificial intelligence (AI) in medical education, emphasizing its as a pedagogical tool for technology-enhanced learning. highlights AI’s potential to enhance learning process various inquiry-based strategies and support Competency-Based Medical Education (CBME) by generating high-quality assessment items with automated personalized feedback, analyzing data from both human supervisors AI, helping predict future professional behavior current trainees. It also addresses inherent challenges limitations using AI student assessment, calling guidelines ensure valid ethical use. Furthermore, integration into virtual patient (VP) technology offer experiences encounters significantly enhances interactivity realism overcoming conventional VPs. Although incorporating chatbots VPs is promising, further research warranted their generalizability across clinical scenarios. The discusses preferences Generation Z learners suggests conceptual framework on how integrate teaching supporting learning, aligning needs today’s students utilizing adaptive capabilities AI. Overall, this areas education where can play pivotal roles overcome educational offers perspectives developments education. calls advance theory practice tools innovate practices tailored understand long-term impacts AI-driven environments.

Language: Английский

Citations

0

Generative AI vs. human expertise: a comparative analysis of case-based rational pharmacotherapy question generation DOI
Muhammed Cihan Güvel, Yavuz Selim Kıyak, Hacer Doğan Varan

et al.

European Journal of Clinical Pharmacology, Journal Year: 2025, Volume and Issue: unknown

Published: April 9, 2025

Language: Английский

Citations

0

Evolution of AI in Anatomy Education: Comparing Current Large Language Models Against Historical ChatGPT Performance on USMLE-Style Questions DOI Creative Commons
Olena Bolgova, Volodymyr Mavrych

Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown

Published: March 24, 2025

Abstract Background The integration of Large Language Models (LLMs) in medical education has gained significant attention, particularly their ability to handle complex knowledge assessments. However, comprehensive evaluation performance anatomical remains limited. To evaluate the accuracy current LLMs compared previous versions answering multiple-choice questions and assessing reliability across different topics. Methods We analyzed four (GPT-4o, Claude, Copilot, Gemini) on 325 USMLE-style MCQs covering seven Each model attempted three times. Results were with year's GPT-3.5 random guessing. Statistical analysis included chi-square tests for differences. Results Current achieved an average 76.8 ± 12.2%, significantly higher than (44.4 8.5%) responses (19.4 5.9%). GPT-4o demonstrated highest (92.9 2.5%), followed by Claude (76.7 5.7%), Copilot (73.9 11.9%), Gemini (63.7 6.5%). Performance varied topics, Head & Neck (79.5%) Abdomen (78.7%) showing rates, while Upper Limb showed lowest (72.9%). Only 29.5% answered correctly all LLMs, 2.5% never correctly. confirmed differences between models topics (χ² = 182.11–518.32, p < 0.001). Conclusions show markedly improved assessment versions, demonstrating superior consistency. variations suggest need careful consideration educational applications. These tools promise as supplementary resources highlighting continued necessity human expertise. Clinical trial number Not applicable.

Language: Английский

Citations

0