Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine DOI
Yudai Kaneda,

Akari Tayuinosho,

Rika Tomoyose

et al.

Journal of Evaluation in Clinical Practice, Journal Year: 2024, Volume and Issue: 30(6), P. 1017 - 1023

Published: May 19, 2024

ChatGPT, a large-scale language model, is notable example of AI's potential in health care. However, its effectiveness clinical settings, especially when compared to human physicians, not fully understood. This study evaluates ChatGPT's capabilities and limitations answering questions for Japanese internal medicine specialists, aiming clarify accuracy tendencies both correct incorrect responses.

Language: Английский

Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy DOI Open Access

Murat Tepe,

Emre Emekli

Cureus, Journal Year: 2024, Volume and Issue: unknown

Published: May 9, 2024

Background Large language models (LLMs), such as ChatGPT-4, Gemini, and Microsoft Copilot, have been instrumental in various domains, including healthcare, where they enhance health literacy aid patient decision-making. Given the complexities involved breast imaging procedures, accurate comprehensible information is vital for engagement compliance. This study aims to evaluate readability accuracy of provided by three prominent LLMs, response frequently asked questions imaging, assessing their potential improve understanding facilitate healthcare communication. Methodology We collected most common on from clinical practice posed them LLMs. then evaluated responses terms accuracy. Responses LLMs were analyzed using Flesch Reading Ease Flesch-Kincaid Grade Level tests through a radiologist-developed Likert-type scale. Results The found significant variations among Gemini Copilot scored higher scales (p < 0.001), indicating easier understand. In contrast, ChatGPT-4 demonstrated greater its 0.001). Conclusions While show promise providing responses, issues may limit utility education. Conversely, despite being less accurate, are more accessible broader audience. Ongoing adjustments evaluations these essential ensure meet diverse needs patients, emphasizing need continuous improvement oversight deployment artificial intelligence technologies healthcare.

Language: Английский

Citations

21

Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions—an observational study DOI
Raju Vaishya, Karthikeyan P. Iyengar, Mohit Kumar Patralekh

et al.

International Orthopaedics, Journal Year: 2024, Volume and Issue: 48(8), P. 1963 - 1969

Published: April 15, 2024

Language: Английский

Citations

17

Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment DOI Creative Commons
U Hin Lai,

Keng Sam Wu,

Ting-Yu Hsu

et al.

Frontiers in Medicine, Journal Year: 2023, Volume and Issue: 10

Published: Sept. 19, 2023

Recent developments in artificial intelligence large language models (LLMs), such as ChatGPT, have allowed for the understanding and generation of human-like text. Studies found LLMs abilities to perform well various examinations including law, business medicine. This study aims evaluate performance ChatGPT United Kingdom Medical Licensing Assessment (UKMLA).

Language: Английский

Citations

41

Decoding medical jargon: The use of AI language models (ChatGPT-4, BARD, microsoft copilot) in radiology reports DOI

Murat Tepe,

Emre Emekli

Patient Education and Counseling, Journal Year: 2024, Volume and Issue: 126, P. 108307 - 108307

Published: May 3, 2024

Language: Английский

Citations

15

ChatGPT as an innovative heutagogical tool in medical education DOI Creative Commons

Nudrat Saleem,

Tabish Mufti,

Shahab Saquib Sohail

et al.

Cogent Education, Journal Year: 2024, Volume and Issue: 11(1)

Published: March 28, 2024

In this study, we aim to investigate the potential advantages of integrating new generative artificial intelligence (AI) technology, ChatGPT, into higher education, specifically within field medical education. The focus is on exploring ChatGPT's applications in personalized learning, assessment, and content creation while also addressing management its limitations ethical considerations. Furthermore, explore use ChatGPT as a instructor classroom context. We seek elucidate responses preset questions two categories separated based targeted remediation, pedagogical knowledge, teacher ethics, query detail, practicality, communication pattern. These are analyzed rubrics designed basics prerequisites, findings reached with thorough comparative analysis. hope that research will improve effective implementation tool for enhancing learning skill development maintaining awareness professionals.

Language: Английский

Citations

11

Big claims, low outcomes: fact checking ChatGPT’s efficacy in handling linguistic creativity and ambiguity DOI Creative Commons
Md. Tauseef Qamar, Juhi Yasmeen, Sanket Kumar Pathak

et al.

Cogent Arts and Humanities, Journal Year: 2024, Volume and Issue: 11(1)

Published: June 18, 2024

Ambiguity has always been a pain in the neck of Natural Language Processing (NLP). Despite enormous AI tools for human language processing, it remains key concern Technology Researchers to develop linguistically intelligent tool that could effectively understand linguistic ambiguity and creativity possessed by language. In this regard, newly designed ChatGPT dramatically attracted attention due its remarkable ability answer questions from wide range domains, which needs reality check. This article scrutinises ChatGPT's interpret neologisms, codemixing, ambiguous sentences. For this, we have tested lexically, syntactically, semantically expressions, codemixed words, as well few game instances. The findings show still fails complex sentences, specifically those common everyday discourse or not part any standard textbook. More specifically, sentences games remain an uphill task understand. implications further improving output ChatGPT.

Language: Английский

Citations

6

Medical knowledge of ChatGPT in public health, infectious diseases, COVID-19 pandemic, and vaccines: multiple choice questions examination based performance DOI Creative Commons
Sultan Ayoub Meo, Metib Alotaibi,

Muhammad Zain Sultan Meo

et al.

Frontiers in Public Health, Journal Year: 2024, Volume and Issue: 12

Published: April 17, 2024

Background At the beginning of year 2023, Chatbot Generative Pre-Trained Transformer (ChatGPT) gained remarkable attention from public. There is a great discussion about ChatGPT and its knowledge in medical sciences, however, literature lacking to evaluate level public health. Therefore, this study investigates health, infectious diseases, COVID-19 pandemic, vaccines. Methods Multiple Choice Questions (MCQs) bank was established. The question’s contents were reviewed confirmed that questions appropriate contents. MCQs based on case scenario, with four sub-stems, single correct answer. From bank, 60 we selected, 30 diseases topics, 17 13 Each MCQ manually entered, tasks given determine MCQs. Results Out total vaccines, attempted all obtained 17/30 (56.66%) marks 15/17 (88.23%) COVID-19, 12/13 (92.30%) vaccines MCQs, an overall score 44/60 (73.33%). observed results answers each section significantly higher ( p = 0.001). satisfactory grades three domains pandemic-allied examination. Conclusion has In future, may assist educators, academicians, healthcare professionals providing better understanding

Language: Английский

Citations

4

From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance DOI Creative Commons
Markus Kipp

Information, Journal Year: 2024, Volume and Issue: 15(9), P. 543 - 543

Published: Sept. 5, 2024

ChatGPT is a large language model trained on increasingly datasets to perform diverse language-based tasks. It capable of answering multiple-choice questions, such as those posed by medical examinations. has been generating considerable attention in both academic and non-academic domains recent months. In this study, we aimed assess GPT’s performance anatomical questions retrieved from licensing examinations Germany. Two different versions were compared. GPT-3.5 demonstrated moderate accuracy, correctly 60–64% the autumn 2022 spring 2021 exams. contrast, GPT-4.o showed significant improvement, achieving 93% accuracy exam 100% exam. When tested 30 unique not available online, maintained 96% rate. Furthermore, consistently outperformed students across six state exams, with statistically mean score 95.54% compared students’ 72.15%. The study demonstrates that outperforms its predecessor, GPT-3.5, cohort students, indicating potential powerful tool education assessment. This improvement highlights rapid evolution LLMs suggests AI could play an important role supporting enhancing training, potentially offering supplementary resources for professionals. However, further research needed limitations practical applications systems real-world practice.

Language: Английский

Citations

4

Comparaison des performances des internes français de chirurgie orthopédique et de l’intelligence artificielle ChatGPT-4/4o aux examens du diplôme d’études spécialisées de chirurgie orthopédique et traumatologique DOI Creative Commons

Nabih Maraqa,

Ramy Samargandi,

A. Poichotte

et al.

Revue de Chirurgie Orthopédique et Traumatologique, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 1, 2025

Citations

0

Capable exam-taker and question-generator: the dual role of generative AI in medical education assessment DOI Creative Commons
Yihong Qiu, Chang Liu

Global Medical Education, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 13, 2025

Abstract Objectives Artificial intelligence (AI) is being increasingly used in medical education. This narrative review presents a comprehensive analysis of generative AI tools’ performance answering and generating exam questions, thereby providing broader perspective on AI’s strengths limitations the education context. Methods The Scopus database was searched for studies examinations from 2022 to 2024. Duplicates were removed, relevant full texts retrieved following inclusion exclusion criteria. Narrative descriptive statistics analyze contents included studies. Results A total 70 analysis. results showed that varied when different types questions specialty with best average accuracy psychiatry, influenced by prompts. With well-crafted prompts, models can efficiently produce high-quality examination questions. Conclusion Generative possesses ability answer using carefully designed Its potential use assessment vast, ranging detecting question error, aiding preparation, facilitating formative assessments, supporting personalized learning. However, it’s crucial educators always double-check responses maintain prevent spread misinformation.

Language: Английский

Citations

0