Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions DOI Creative Commons
Malik Sallam,

Khaled Al‐Salahat,

Huda Eid

и другие.

Advances in Medical Education and Practice, Год журнала: 2024, Номер Volume 15, С. 857 - 871

Опубликована: Сен. 1, 2024

Artificial intelligence (AI) chatbots excel in language understanding and generation. These models can transform healthcare education practice. However, it is important to assess the performance of such AI various topics highlight its strengths possible limitations. This study aimed evaluate ChatGPT (GPT-3.5 GPT-4), Bing, Bard compared human students at a postgraduate master's level Medical Laboratory Sciences.

Язык: Английский

Accuracy and Repeatability of ChatGPT Based on a Set of Multiple-Choice Questions on Objective Tests of Hearing DOI Open Access
Krzysztof Kochanek, Henryk Skarżyńśki, W. Wiktor Jędrzejczak

и другие.

Cureus, Год журнала: 2024, Номер unknown

Опубликована: Май 8, 2024

Introduction: ChatGPT has been tested in many disciplines, but only a few have involved hearing diagnosis and none to physiology or audiology more generally. The consistency of the chatbot's responses same question posed multiple times not well investigated either. This study aimed assess accuracy repeatability 3.5 4 on test questions concerning objective measures hearing. Of particular interest was short-term which here four separate days extended over one week. Methods: We used 30 single-answer, multiple-choice exam from one-year course methods testing were five both (the free version) paid each (two week two following week). evaluated terms response key. To evaluate time, percent agreement Cohen's Kappa calculated. Results: overall 48-49%, while that 65-69%. consistently failed pass threshold 50% correct responses. Within single day, 76-79% for 87-88% (Cohen's 0.67-0.71 0.81-0.84 respectively). between different 75-79% 85-88% 0.65-0.69 0.80-0.85 Conclusion: outperforms higher time. However, great variability casts doubt possible professional applications versions.

Язык: Английский

Процитировано

14

AI-driven translations for kidney transplant equity in Hispanic populations DOI Creative Commons
Oscar A. Garcia Valencia, Charat Thongprayoon, Caroline C. Jadlowiec

и другие.

Scientific Reports, Год журнала: 2024, Номер 14(1)

Опубликована: Апрель 12, 2024

Abstract Health equity and accessing Spanish kidney transplant information continues being a substantial challenge facing the Hispanic community. This study evaluated ChatGPT’s capabilities in translating 54 English frequently asked questions (FAQs) into using two versions of AI model, GPT-3.5 GPT-4.0. The FAQs included 19 from Organ Procurement Transplantation Network (OPTN), 15 National Service (NHS), 20 Kidney Foundation (NKF). Two native Spanish-speaking nephrologists, both whom are Mexican heritage, scored translations for linguistic accuracy cultural sensitivity tailored to Hispanics 1–5 rubric. inter-rater reliability evaluators, measured by Cohen’s Kappa, was 0.85. Overall 4.89 ± 0.31 versus 4.94 0.23 GPT-4.0 (non-significant p = 0.23). Both 4.96 0.19 (p 1.00). By source, 4.84 0.37 4.93 0.26 4.90 4.95 0.22 For sensitivity, 5.00 0.00 (NKF), while These high scores demonstrate Chat GPT effectively translated across systems. findings suggest GPT’s potential promote health improving access essential information. Additional research should evaluate its medical translation diverse contexts/languages. English-to-Spanish may increase vital underserved patients.

Язык: Английский

Процитировано

13

Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis DOI Creative Commons

Hye Kyung Jin,

Ha Eun Lee, Eun Young Kim

и другие.

BMC Medical Education, Год журнала: 2024, Номер 24(1)

Опубликована: Сен. 16, 2024

ChatGPT, a recently developed artificial intelligence (AI) chatbot, has demonstrated improved performance in examinations the medical field. However, thus far, an overall evaluation of potential ChatGPT models (ChatGPT-3.5 and GPT-4) variety national health licensing is lacking. This study aimed to provide comprehensive assessment models' for medical, pharmacy, dentistry, nursing research through meta-analysis.

Язык: Английский

Процитировано

11

Large language models for life cycle assessments: Opportunities, challenges, and risks DOI

Nathan Preuss,

Abdulelah S. Alshehri, Fengqi You

и другие.

Journal of Cleaner Production, Год журнала: 2024, Номер 466, С. 142824 - 142824

Опубликована: Июнь 10, 2024

Язык: Английский

Процитировано

10

The potential of ChatGPT in medicine: an example analysis of nephrology specialty exams in Poland DOI Creative Commons
Jan Nicikowski, Mikołaj Szczepański, Miłosz Miedziaszczyk

и другие.

Clinical Kidney Journal, Год журнала: 2024, Номер 17(8)

Опубликована: Июнь 21, 2024

In November 2022, OpenAI released a chatbot named ChatGPT, product capable of processing natural language to create human-like conversational dialogue. It has generated lot interest, including from the scientific community and medical science community. Recent publications have shown that ChatGPT can correctly answer questions exams such as United States Medical Licensing Examination other specialty exams. To date, there been no studies in which tested on field nephrology anywhere world.

Язык: Английский

Процитировано

10

ChatGPT (GPT-4V) Performance on the Healthcare Information Technologist Examination in Japan DOI Open Access
Kai Ishida, Eisuke Hanada

Cureus, Год журнала: 2025, Номер unknown

Опубликована: Янв. 1, 2025

Introduction The Chat Generative Pretrained Transformer (ChatGPT) has developed rapidly and is used in many fields, including healthcare informatics. This study evaluated ChatGPT (GPT-4V)'s performance on the Healthcare Information Technologist (HCIT) certification exam Japan, which assesses certified professionals who work with electronic health records to improve patient care. Methodology Four hundred seventy-six questions from HCIT were targeted over three years. (GPT-4V) was tested its ability answer an determine if it could perform as well or better than aspirants taking exam. Moreover, for each academic category, format, presence absence of images, calculations. Results mean correct rate all 84%. achieved passing criteria. simple-choice (A-type) higher that multiple-choice (X2-type) (P < 0.05). success images lower text-only 0.01), requiring calculations those without Conclusions met criteria 19th 21st exams, suggesting effective may possess minimum required knowledge, understanding, application skills certification.

Язык: Английский

Процитировано

1

Evaluating AI performance in nephrology triage and subspecialty referrals DOI Creative Commons

Priscilla Koirala,

Charat Thongprayoon,

Jing Miao

и другие.

Scientific Reports, Год журнала: 2025, Номер 15(1)

Опубликована: Янв. 27, 2025

Язык: Английский

Процитировано

1

ChatGPT’s Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini DOI Creative Commons
Filipe Prazeres

JMIR Medical Education, Год журнала: 2025, Номер 11, С. e65108 - e65108

Опубликована: Март 5, 2025

Advancements in ChatGPT are transforming medical education by providing new tools for assessment and learning, potentially enhancing evaluations doctors improving instructional effectiveness. This study evaluates the performance consistency of ChatGPT-3.5 Turbo ChatGPT-4o mini solving European Portuguese examination questions (2023 National Examination Access to Specialized Training; Prova Nacional de Acesso à Formação Especializada [PNA]) compares their human candidates. was tested on first part (74 questions) July 18, 2024, second 19, 2024. Each model generated an answer using its natural language processing capabilities. To test consistency, each asked, "Are you sure?" after answer. Differences between responses were analyzed McNemar with continuity correction. A single-parameter t compared models' Frequencies percentages used categorical variables, means CIs numerical variables. Statistical significance set at P<.05. achieved accuracy rate 65% (48/74) 2023 PNA examination, surpassing Turbo. outperformed candidates, while had a more moderate performance. highlights advancements potential models education, emphasizing need careful implementation teacher oversight further research.

Язык: Английский

Процитировано

1

Exploring the impact of GAI-assisted feedback on pre-service teachers’ situational engagement and performance in inquiry-based online discussion DOI
Guoqing Lu, Shen Ba

Educational Psychology, Год журнала: 2025, Номер unknown, С. 1 - 26

Опубликована: Апрель 9, 2025

Язык: Английский

Процитировано

1

ChatGPT-3.5 passes Poland’s medical final examination—Is it possible for ChatGPT to become a doctor in Poland? DOI Creative Commons
Szymon Suwała,

Paulina Szulc,

Cezary Guzowski

и другие.

SAGE Open Medicine, Год журнала: 2024, Номер 12

Опубликована: Янв. 1, 2024

Objectives: ChatGPT is an advanced chatbot based on Large Language Model that has the ability to answer questions. Undoubtedly, capable of transforming communication, education, and customer support; however, can it play role a doctor? In Poland, prior obtaining medical diploma, candidates must successfully pass Medical Final Examination. Methods: The purpose this research was determine how well performed Polish Examination, which passing required become doctor in Poland (an exam considered passed if at least 56% tasks are answered correctly). A total 2138 categorized Examination questions (from 11 examination sessions held between 2013–2015 2021–2023) were presented ChatGPT-3.5 from 19 26 May 2023. For further analysis, divided into quintiles difficulty duration, as question types (simple A-type or complex K-type). answers provided by compared official key, reviewed for any changes resulting advancement knowledge. Results: correctly 53.4%–64.9% 8 out sessions, achieved scores (60%). correlation efficacy artificial intelligence level complexity, difficulty, length found be negative. AI outperformed humans one category: psychiatry (77.18% vs. 70.25%, p = 0.081). Conclusions: performance deemed satisfactory; observed markedly inferior human graduates majority instances. Despite its potential utility many areas, constrained inherent limitations prevent entirely supplanting expertise

Язык: Английский

Процитировано

9