Evaluating the Accuracy and Reliability of Large Language Models (ChatGPT, Claude, DeepSeek, Gemini, Grok, and Le Chat) in Answering Item-Analyzed Multiple-Choice Questions on Blood Physiology DOI Open Access
Mayank Agarwal, Priyanka Sharma,

P. A. Wani

et al.

Cureus, Journal Year: 2025, Volume and Issue: unknown

Published: April 8, 2025

Language: Английский

A systematic review of the impact of artificial intelligence on educational outcomes in health professions education DOI Creative Commons
Eva Feigerlová,

Hind Hani,

Ellie Hothersall-Davies

et al.

BMC Medical Education, Journal Year: 2025, Volume and Issue: 25(1)

Published: Jan. 27, 2025

Language: Английский

Citations

4

Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis DOI Creative Commons

Hye Kyung Jin,

Ha Eun Lee, Eun Young Kim

et al.

BMC Medical Education, Journal Year: 2024, Volume and Issue: 24(1)

Published: Sept. 16, 2024

ChatGPT, a recently developed artificial intelligence (AI) chatbot, has demonstrated improved performance in examinations the medical field. However, thus far, an overall evaluation of potential ChatGPT models (ChatGPT-3.5 and GPT-4) variety national health licensing is lacking. This study aimed to provide comprehensive assessment models' for medical, pharmacy, dentistry, nursing research through meta-analysis.

Language: Английский

Citations

9

Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis (Preprint) DOI Creative Commons
Hui Zong, Rongrong Wu, Jiaxue Cha

et al.

Journal of Medical Internet Research, Journal Year: 2024, Volume and Issue: 26, P. e66114 - e66114

Published: Dec. 10, 2024

Background Large language models (LLMs) are increasingly integrated into medical education, with transformative potential for learning and assessment. However, their performance across diverse exams globally has remained underexplored. Objective This study aims to introduce MedExamLLM, a comprehensive platform designed systematically evaluate the of LLMs on worldwide. Specifically, seeks (1) compile curate data worldwide exams; (2) analyze trends disparities in LLM capabilities geographic regions, languages, contexts; (3) provide resource researchers, educators, developers explore advance integration artificial intelligence education. Methods A systematic search was conducted April 25, 2024, PubMed database identify relevant publications. Inclusion criteria encompassed peer-reviewed, English-language, original research articles that evaluated at least one exams. Exclusion included review articles, non-English publications, preprints, studies without performance. The screening process candidate publications independently by 2 researchers ensure accuracy reliability. Data, including exam information, model performance, availability, references, were manually curated, standardized, organized. These curated MedExamLLM platform, enabling its functionality visualize geographic, linguistic, characteristics. web developed focus accessibility, interactivity, scalability support continuous updates user engagement. Results total 193 final analysis. comprised information 16 198 28 countries 15 languages from year 2009 2023. United States accounted highest number related English being dominant used these Generative Pretrained Transformer (GPT) series models, especially GPT-4, demonstrated superior achieving pass rates significantly higher than other LLMs. analysis revealed significant variability different linguistic contexts. Conclusions is an open-source, freely accessible, publicly available online providing evaluation evidence knowledge about around world. serves as valuable fields clinical medicine intelligence. By synthesizing capabilities, provides insights Limitations include biases source exclusion literature. Future should address gaps methods enhance

Language: Английский

Citations

9

Opportunities and Challenges in Harnessing Digital Technology for Effective Teaching and Learning DOI Creative Commons
Zhongzhou Chen, Chandralekha Singh

Trends in Higher Education, Journal Year: 2025, Volume and Issue: 4(1), P. 6 - 6

Published: Jan. 27, 2025

Most of today’s educators are in no shortage digital and online learning technologies available at their fingertips, ranging from Learning Management Systems such as Canvas, Blackboard, or Moodle, meeting tools, homework, tutoring systems, exam proctoring platforms, computer simulations, even virtual reality/augmented reality technologies. Furthermore, with the rapid development wide availability generative artificial intelligence (GenAI) services ChatGPT, we just beginning harnessing potential to transform higher education. Yet, facing large number options provided by cutting-edge technology, an imminent question on mind most is following: how should I choose integrate them into my teaching process so that they would best support student learning? We contemplate over these types important timely questions share our reflections evidence-based approaches tools using a Self-regulated Engaged Framework have employed research physics education can be valuable for other disciplines.

Language: Английский

Citations

1

Digital Innovation in Medical Education: The Process and Challenges of Digital Transformation DOI Open Access
Youngjon Kim

Korean Medical Education Review, Journal Year: 2025, Volume and Issue: 27(1), P. 6 - 16

Published: Feb. 28, 2025

Digital transformation in medical education has emerged as a critical driver of educational innovation, but it also presents several challenges and issues. This exploratory study was conducted preparation for digital leap education, critically examining the meaning process schools Korea. The been divided into digitization, digitalization, transformation, reflecting progressive course education. approaches involving are described this study, differentiating between learner-centered adaptive learning, experience-based immersive learning environments, integration assessment learning. Additionally, potential emerging technologies, such large language models, cloud computing, blockchain, is explored. constraints on include limitations digitalization materials, lack empirical evidence effectiveness tools, unpreparedness stakeholders, ethical, physical, psychological conclusion emphasizes that should not be temporary measure, true advancement highlighting importance design based needs to increase effectiveness. It highlights ethical use tools creation safe environment fairness trust process. Finally, underscores significance flexible curriculum driven by needs, interdisciplinary approaches, evaluation dissemination initiatives.

Language: Английский

Citations

1

Comparison of ChatGPT‐4, Copilot, Bard and Gemini Ultra on an Otolaryngology Question Bank DOI Creative Commons
Rashi Ramchandani, Eddie Guo,

Michael Mostowy

et al.

Clinical Otolaryngology, Journal Year: 2025, Volume and Issue: unknown

Published: March 13, 2025

ABSTRACT Objective To compare the performance of Google Bard, Microsoft Copilot, GPT‐4 with vision (GPT‐4) and Gemini Ultra on OTO Chautauqua, a student‐created, faculty‐reviewed otolaryngology question bank. Study Design Comparative evaluation different LLMs. Setting N/A. Participants Methods Large language models (LLMs) are being extensively tested in medical education. However, their accuracy effectiveness remain understudied, particularly otolaryngology. This study involved inputting 350 single‐best‐answer multiple choice questions, including 18 image‐based into four LLMS. Questions were sourced from six independent banks related to (a) rhinology, (b) head neck oncology, (c) endocrinology, (d) general otolaryngology, (e) paediatrics, (f) otology, (g) facial plastics, reconstruction (h) trauma. LLMs instructed provide an output reasoning for answers, length which was recorded. Results Aggregate subgroup analysis revealed that (79.8%) outperformed other LLMs, followed by (71.1%), Copilot (68.0%), Bard (65.1%) accuracy. The had significantly average response lengths, (x̄ = 1685.24) longest no difference between 827.34) 904.12). Gemini's longer responses =1291.68) included explanatory images links. correctly answered questions ( n 18), unlike highlighting adaptability multimodal capabilities. Conclusion terms accuracy, GPT‐4, Bard. although it has second‐highest provides concise relevant explanations. Despite promising learners should cautiously assess decision‐making reliability.

Language: Английский

Citations

1

Evaluating the application of ChatGPT in China’s residency training education: An exploratory study DOI

Luxiang Shang,

Rui Li,

Mingyue Xue

et al.

Medical Teacher, Journal Year: 2024, Volume and Issue: unknown, P. 1 - 7

Published: July 12, 2024

The purpose of this study was to assess the utility information generated by ChatGPT for residency education in China.

Language: Английский

Citations

7

ChatGPT versus a customized AI chatbot (Anatbuddy) for anatomy education: A comparative pilot study DOI

G. Arun,

Vivek Perumal,

Francis Paul John Bato Urias

et al.

Anatomical Sciences Education, Journal Year: 2024, Volume and Issue: 17(7), P. 1396 - 1405

Published: Aug. 21, 2024

Large Language Models (LLMs) have the potential to improve education by personalizing learning. However, ChatGPT-generated content has been criticized for sometimes producing false, biased, and/or hallucinatory information. To evaluate AI's ability return clear and accurate anatomy information, this study generated a custom interactive intelligent chatbot (Anatbuddy) through an Open AI Application Programming Interface (API) that enables seamless AI-driven interactions within secured cloud infrastructure. Anatbuddy was programmed Retrieval Augmented Generation (RAG) method provide context-aware responses user queries based on predetermined knowledge base. compare their outputs, various (i.e., prompts) thoracic (n = 18) were fed into ChatGPT 3.5. A panel comprising three experienced anatomists evaluated both tools' factual accuracy, relevance, completeness, coherence, fluency 5-point Likert scale. These ratings reviewed third party blinded study, who revised finalized scores as needed. Anatbuddy's accuracy (mean ± SD 4.78/5.00 0.43; median 5.00) rated significantly higher (U 84, p 0.01) than ChatGPT's (4.11 0.83; 4.00). No statistically significant differences detected between chatbots other variables. Given current limitations, we strongly recommend profession develop utilizing carefully curated base ensure accuracy. Further research is needed determine students' acceptance of influence learning experiences outcomes.

Language: Английский

Citations

6

The Emerging Role of Large Language Models in Improving Prostate Cancer Literacy DOI Creative Commons
Marius Geantă,

Daniel Bădescu,

Narcis Chirca

et al.

Bioengineering, Journal Year: 2024, Volume and Issue: 11(7), P. 654 - 654

Published: June 27, 2024

This study assesses the effectiveness of chatbots powered by Large Language Models (LLMs)—ChatGPT 3.5, CoPilot, and Gemini—in delivering prostate cancer information, compared to official Patient’s Guide. Using 25 expert-validated questions, we conducted a comparative analysis evaluate accuracy, timeliness, completeness, understandability through Likert scale. Statistical analyses were used quantify performance each model. Results indicate that ChatGPT 3.5 consistently outperformed other models, establishing itself as robust reliable source information. CoPilot also performed effectively, albeit slightly less so than 3.5. Despite strengths Guide, advanced capabilities LLMs like significantly enhance educational tools in healthcare. The findings underscore need for ongoing innovation improvement AI applications within health sectors, especially considering ethical implications underscored forthcoming EU Act. Future research should focus on investigating potential biases AI-generated responses their impact patient outcomes.

Language: Английский

Citations

5

Framework for Integrating Generative AI in Developing Competencies for Accounting and Audit Professionals DOI Open Access
Ionuț Anica-Popa, Marinela Vrîncianu, Liana-Elena Anica-Popa

et al.

Electronics, Journal Year: 2024, Volume and Issue: 13(13), P. 2621 - 2621

Published: July 4, 2024

The study aims to identify the knowledge, skills and competencies required by accounting auditing (AA) professionals in context of integrating disruptive Generative Artificial Intelligence (GenAI) technologies develop a framework for GenAI capabilities into organisational systems, harnessing its potential revolutionise lifelong learning development assist day-to-day operations decision-making. Through systematic literature review, 103 papers were analysed, outline, current business ecosystem, competencies’ demand generated AI adoption and, particular, associated risks, thus contributing body knowledge underexplored research areas. Positioned at confluence accounting, GenAI, paper introduces meaningful overview areas effective data analysis, interpretation findings, risk awareness management. It emphasizes reshapes role discovering true adopting it accordingly. new LLM-based system model that can enhance through collaboration with similar systems provides an explanatory scenario illustrate applicability audit area.

Language: Английский

Citations

5