Evaluating AI Proficiency in Nuclear Cardiology: Large Language Models take on the Board Preparation Exam DOI

Valerie Builoff,

Aakash Shanbhag, Robert J.H. Miller

et al.

Journal of Nuclear Cardiology, Journal Year: 2024, Volume and Issue: unknown, P. 102089 - 102089

Published: Nov. 1, 2024

Language: Английский

Comparison of the problem-solving performance of ChatGPT-3.5, ChatGPT-4, Bing Chat, and Bard for the Korean emergency medicine board examination question bank DOI Creative Commons
Go Un Lee, Dae Young Hong,

Sin Young Kim

et al.

Medicine, Journal Year: 2024, Volume and Issue: 103(9), P. e37325 - e37325

Published: March 1, 2024

Large language models (LLMs) have been deployed in diverse fields, and the potential for their application medicine has explored through numerous studies. This study aimed to evaluate compare performance of ChatGPT-3.5, ChatGPT-4, Bing Chat, Bard Emergency Medicine Board Examination question bank Korean language. Of 2353 questions bank, 150 were randomly selected, 27 containing figures excluded. Questions that required abilities such as analysis, creative thinking, evaluation, synthesis classified higher-order questions, those only recall, memory, factual information response lower-order questions. The answers explanations obtained by inputting 123 into LLMs analyzed compared. ChatGPT-4 (75.6%) Chat (70.7%) showed higher correct rates than ChatGPT-3.5 (56.9%) (51.2%). highest rate at 76.5%, 71.4%. appropriateness explanation answer was significantly (75.6%, 68.3%, 52.8%, 50.4%, respectively). outperformed answering a random selection

Language: Английский

Citations

10

A comparative study of English and Japanese ChatGPT responses to anaesthesia-related medical questions DOI Creative Commons
Kazuo Ando,

Sato Masaki,

Shin Wakatsuki

et al.

BJA Open, Journal Year: 2024, Volume and Issue: 10, P. 100296 - 100296

Published: June 1, 2024

Language: Английский

Citations

9

Evaluating GPT-4o's Performance in the Official European Board of Radiology Exam: A Comprehensive Assessment DOI
Muhammed Said Beşler, Laura Oleaga,

Vanesa Junquero

et al.

Academic Radiology, Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 1, 2024

Language: Английский

Citations

9

Comparative analysis of ChatGPT and Gemini (Bard) in medical inquiry: a scoping review DOI Creative Commons

Fattah H. Fattah,

Abdulwahid M. Salih,

Ameer M. Salih

et al.

Frontiers in Digital Health, Journal Year: 2025, Volume and Issue: 7

Published: Feb. 3, 2025

Introduction Artificial intelligence and machine learning are popular interconnected technologies. AI chatbots like ChatGPT Gemini show considerable promise in medical inquiries. This scoping review aims to assess the accuracy response length (in characters) of applications. Methods The eligible databases were searched find studies published English from January 1 October 20, 2023. inclusion criteria consisted that focused on using medicine assessed outcomes based character count (length) Gemini. Data collected included first author's name, country where study was conducted, type design, publication year, sample size, speciality, length. Results initial search identified 64 papers, with 11 meeting criteria, involving 1,177 samples. showed higher radiology (87.43% vs. Gemini's 71%) shorter responses (907 1,428 characters). Similar trends noted other specialties. However, outperformed emergency scenarios (87% 77%) renal diets low potassium high phosphorus (79% 60% 100% 77%). Statistical analysis confirms has greater than studies, a p -value <.001 for both metrics. Conclusion Scoping suggests may demonstrate provide studies.

Language: Английский

Citations

1

The Performance of GPT-3.5, GPT-4, and Bard on the Japanese National Dentist Examination: A Comparison Study DOI Open Access
Keiichi Ohta,

Satomi Ohta

Cureus, Journal Year: 2023, Volume and Issue: unknown

Published: Dec. 12, 2023

Purpose This study aims to evaluate the performance of three large language models (LLMs), Generative Pre-trained Transformer (GPT)-3.5, GPT-4, and Google Bard, on 2023 Japanese National Dentist Examination (JNDE) assess their potential clinical applications in Japan. Methods A total 185 questions from JNDE were used. These categorized by question type category. McNemar's test compared correct response rates between two LLMs, while Fisher's exact evaluated LLMs each Results The overall 73.5% for 66.5% 51.9% GPT-3.5. GPT-4 showed a significantly higher rate than Bard In category essential questions, achieved 80.5%, surpassing passing criterion 80%. contrast, both GPT-3.5 fell short this benchmark, with attaining 77.6% only 52.5%. scores that (p<0.01). For general 71.2% 58.5% 52.5% outperformed professional dental 51.6% 45.3% 35.9% differences among not statistically significant. All demonstrated lower accuracy dentistry other types Conclusions highest score JNDE, followed However, surpassed questions. To further understand application worldwide, more research examinations across different languages is required.

Language: Английский

Citations

23

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o DOI
Enes Efe İş, Ahmet Kıvanç Menekşeoğlu

Clinical Rheumatology, Journal Year: 2024, Volume and Issue: 43(11), P. 3507 - 3513

Published: Sept. 28, 2024

Language: Английский

Citations

8

How well do large language model-based chatbots perform in oral and maxillofacial radiology? DOI Creative Commons

Hui Jeong,

Sang‐Sun Han, Youngjae Yu

et al.

Dentomaxillofacial Radiology, Journal Year: 2024, Volume and Issue: 53(6), P. 390 - 395

Published: June 7, 2024

Abstract Objectives This study evaluated the performance of four large language model (LLM)-based chatbots by comparing their test results with those dental students on an oral and maxillofacial radiology examination. Methods ChatGPT, ChatGPT Plus, Bard, Bing Chat were tested 52 questions from regular college examinations. These categorized into three educational content areas: basic knowledge, imaging equipment, image interpretation. They also classified as multiple-choice (MCQs) short-answer (SAQs). The accuracy rates compared students, further analysis was conducted based question type. Results students’ overall rate 81.2%, while that varied: 50.0% for 65.4% 63.5% Chat. Plus achieved a higher knowledge than (93.8% vs. 78.7%). However, all performed poorly in interpretation, below 35.0%. All scored less 60.0% MCQs, but better SAQs. Conclusions unsatisfactory. Further training using specific, relevant data derived solely reliable sources is required. Additionally, validity these chatbots’ responses must be meticulously verified.

Language: Английский

Citations

7

Capability of multimodal large language models to interpret pediatric radiological images DOI
Thomas P. Reith, Donna M. D’Alessandro, Michael P. D’Alessandro

et al.

Pediatric Radiology, Journal Year: 2024, Volume and Issue: 54(10), P. 1729 - 1737

Published: Aug. 12, 2024

Language: Английский

Citations

7

Diagnostic Accuracy of Large Language Models in the European Board of Interventional Radiology Examination (EBIR) Sample Questions DOI
Yasin Celal Güneş, Turay Cesur

CardioVascular and Interventional Radiology, Journal Year: 2024, Volume and Issue: 47(6), P. 836 - 837

Published: Feb. 22, 2024

Language: Английский

Citations

6

Assessing the Accuracy of AI Models in Orthodontic Knowledge: A Comparative Study Between ChatGPT-4 and Google Bard DOI Open Access
Sadia Naureen, Huma Ghazanfar Kiani

Journal of College of Physicians And Surgeons Pakistan, Journal Year: 2024, Volume and Issue: unknown, P. 761 - 766

Published: July 1, 2024

Objective: To compare the knowledge accuracy of ChatGPT-4 and Google Bard in response to knowledge-based questions related orthodontic diagnosis treatment modalities.

Language: Английский

Citations

6