Evaluating the Performance of ChatGPT in the Prescribing Safety Assessment: Implications for Artificial Intelligence-Assisted Prescribing DOI Open Access

D.R. Bull,

Dide Okaygoun

Cureus, Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 4, 2024

Objective With the rapid advancement of artificial intelligence (AI) technologies, models like Chat Generative Pre-Trained Transformer (ChatGPT) are increasingly being evaluated for their potential applications in healthcare. The Prescribing Safety Assessment (PSA) is a standardised test junior physicians UK to evaluate prescribing competence. This study aims assess ChatGPT's ability pass PSA and its performance across different exam sections. Methodology ChatGPT (version GPT-4) was tested on four official practice papers, each containing 30 questions, three independent trials per paper, with answers using mark schemes. Performance measured by calculating overall percentage scores comparing them marks provided paper. Subsection also analysed identify strengths weaknesses. Results achieved mean 257/300 (85.67%), 236/300 (78.67%), 199/300 (66.33%), 233/300 (77.67%) consistently surpassing where available. performed well sections requiring factual recall, such as "Adverse Drug Reactions", scoring 63/72 (87.50%), "Communicating Information", (88.89%). However, it struggled "Data Interpretation", 32/72 (44.44%), showing variability indicating limitations handling more complex clinical reasoning tasks. Conclusion While demonstrated strong passing excelling knowledge, data interpretation highlight current gaps AI's fully replicate human judgement. shows promise supporting safe prescribing, particularly areas prone error, drug interactions communicating correct information. due tasks, not yet ready replace prescribers should instead serve supplemental tool practice.

Language: Английский

Comparative Evaluation of AI Models Such as ChatGPT 3.5, ChatGPT 4.0, and Google Gemini in Neuroradiology Diagnostics DOI Open Access
Rishi Gupta,

Abdullgabbar M Hamid,

Miral D. Jhaveri

et al.

Cureus, Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 25, 2024

Advances in artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT (versions 3.5 and 4.0) Google Gemini, are transforming healthcare. This study explores the performance of these AI solving diagnostic quizzes from "Neuroradiology: A Core Review" to evaluate their potential as tools radiology.

Language: Английский

Citations

5

The Performance of Artificial Intelligence-based Large Language Models on Ophthalmology-related Questions in Swedish Proficiency Test for Medicine: ChatGPT-4 omni vs Gemini 1.5 Pro DOI Creative Commons
Mehmet Cem Sabaner, Arzu Seyhan Karatepe, Kemal Mert Mutibayraktaroglu

et al.

Deleted Journal, Journal Year: 2024, Volume and Issue: unknown, P. 100070 - 100070

Published: Sept. 1, 2024

Language: Английский

Citations

4

Evaluating AI Proficiency in Nuclear Cardiology: Large Language Models take on the Board Preparation Exam DOI

Valerie Builoff,

Aakash Shanbhag, Robert J.H. Miller

et al.

Journal of Nuclear Cardiology, Journal Year: 2024, Volume and Issue: unknown, P. 102089 - 102089

Published: Nov. 1, 2024

Language: Английский

Citations

4

Comparaison des performances des internes français de chirurgie orthopédique et de l’intelligence artificielle ChatGPT-4/4o aux examens du diplôme d’études spécialisées de chirurgie orthopédique et traumatologique DOI Creative Commons

Nabih Maraqa,

Ramy Samargandi,

A. Poichotte

et al.

Revue de Chirurgie Orthopédique et Traumatologique, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 1, 2025

Citations

0

Assessing the performance of Microsoft Copilot, GPT-4 and Google Gemini in ophthalmology DOI Creative Commons

Meziane Silhadi,

Wissam B. Nassrallah, David Mikhail

et al.

Canadian Journal of Ophthalmology, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 1, 2025

To evaluate the performance of large language models (LLMs), specifically Microsoft Copilot, GPT-4 (GPT-4o and GPT-4o mini), Google Gemini (Gemini Advanced), in answering ophthalmological questions assessing impact prompting techniques on their accuracy. Prospective qualitative study. Advanced). A total 300 from StatPearls were tested, covering a range subspecialties image-based tasks. Each question was evaluated using 2 techniques: zero-shot forced (prompt 1) combined role-based plan-and-solve+ 2). With prompting, demonstrated significantly superior overall performance, correctly 72.3% outperforming all other models, including Copilot (53.7%), mini (62.0%), (54.3%), Advanced (62.0%) (p < 0.0001). Both showed notable improvements with Prompt over 1, elevating Copilot's accuracy lowest (53.7%) to second highest (72.3%) among LLMs. While newer iterations LLMs, such as Advanced, outperformed less advanced counterparts Gemini), this study emphasizes need for caution clinical applications these models. The choice influences highlighting necessity further research refine LLMs capabilities, particularly visual data interpretation, ensure safe integration into medical practice.

Language: Английский

Citations

0

Evaluating the Accuracy of Gemini 2.0 Advanced and ChatGPT 4o in Cataract Knowledge: A Performance Analysis Using Brazilian Council of Ophthalmology Board Exam Questions DOI Open Access

Diego Casagrande,

Mauro Gobira

Cureus, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 24, 2025

Introduction: Large language models (LLMs) like Gemini 2.0 Advanced and ChatGPT-4o are increasingly applied in medical contexts. This study assesses their accuracy answering cataract-related questions from Brazilian ophthalmology board exams, evaluating potential for clinical decision support. Methods: A retrospective analysis was conducted using 221 multiple-choice questions. Responses both LLMs were evaluated by two independent ophthalmologists against the official answer key. Accuracy rates inter-evaluator agreement (Cohen's kappa) analyzed. Results: achieved 85.45% 80.91% accuracy, while scored 80.00% 84.09%. Inter-evaluator moderate (κ = 0.514 0.431, respectively). Performance varied across exam years. Conclusion: Both demonstrated high questions, supporting as educational tools. However, performance variability indicate need further refinement validation.

Language: Английский

Citations

0

Comparison of ChatGPT-4o, Google Gemini 1.5 Pro, Microsoft Copilot Pro, and Ophthalmologists in the management of uveitis and ocular inflammation: A comparative study of large language models DOI

Senol Demir

Journal Français d Ophtalmologie, Journal Year: 2025, Volume and Issue: 48(4), P. 104468 - 104468

Published: March 13, 2025

Language: Английский

Citations

0

Accuracy and quality of ChatGPT-4o and Google Gemini performance on image-based neurosurgery board questions DOI

Suyash Sau,

Derek D. George, Rohin Singh

et al.

Neurosurgical Review, Journal Year: 2025, Volume and Issue: 48(1)

Published: March 25, 2025

Language: Английский

Citations

0

Retina Meets Artificial Intelligence DOI
Paras P. Shah, Margarita Labkovich, Daniel Zhu

et al.

Advances in Ophthalmology and Optometry, Journal Year: 2025, Volume and Issue: unknown

Published: April 1, 2025

Citations

0

Evaluating AI Proficiency in Nuclear Cardiology: Large Language Models take on the Board Preparation Exam DOI

Valerie Builoff,

Aakash Shanbhag,

Robert JH Miller

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: July 16, 2024

Previous studies evaluated the ability of large language models (LLMs) in medical disciplines; however, few have focused on image analysis, and none specifically cardiovascular imaging or nuclear cardiology.

Language: Английский

Citations

3