Fairness in AI-Driven Oncology: Investigating Racial and Gender Biases in Large Language Models DOI Open Access
Anjali Agrawal

Cureus, Год журнала: 2024, Номер unknown

Опубликована: Сен. 16, 2024

Large language model (LLM) chatbots have many applications in medical settings. However, these tools can potentially perpetuate racial and gender biases through their responses, worsening disparities healthcare. With the ongoing discussion of LLM oncology widespread goal addressing cancer disparities, this study focuses on propagated by oncology.

Язык: Английский

Evaluation of the Performance of Three Large Language Models in Clinical Decision Support: A Comparative Study Based on Actual Cases DOI
Xueqi Wang, Haiyan Ye,

S. H. Zhang

и другие.

Journal of Medical Systems, Год журнала: 2025, Номер 49(1)

Опубликована: Фев. 14, 2025

Язык: Английский

Процитировано

1

Assessing the Potential of USMLE-Like Exam Questions Generated by GPT-4 DOI Creative Commons
Suhana Bedi, Scott L. Fleming, Chia‐Chun Chiang

и другие.

medRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Апрель 28, 2023

The United States Medical Licensing Examination (USMLE) is a critical step in assessing the competence of future physicians, yet process creating exam questions and study materials both time-consuming costly. While Large Language Models (LLMs), such as OpenAI’s GPT-4, have demonstrated proficiency answering medical questions, their potential generating remains underexplored. This presents QUEST-AI, novel system that utilizes LLMs to (1) generate USMLE-style (2) identify flag incorrect (3) correct errors flagged questions. We evaluated this system’s output by constructing test set 50 LLM-generated mixed with human-generated conducting two-part assessment three physicians two students. assessors attempted distinguish between LLM validity content. A majority generated QUEST-AI were deemed valid panel clinicians, strong correlations performance on pioneering application education could significantly increase ease efficiency developing content, offering cost-effective accessible alternative for preparation.

Язык: Английский

Процитировано

12

A guide to prompt design: foundations and applications for healthcare simulationists DOI Creative Commons

Sara Maaz,

Janice C. Palaganas,

Gerry Palaganas

и другие.

Frontiers in Medicine, Год журнала: 2025, Номер 11

Опубликована: Янв. 30, 2025

Large Language Models (LLMs) like ChatGPT, Gemini, and Claude gain traction in healthcare simulation; this paper offers simulationists a practical guide to effective prompt design. Grounded structured literature review iterative testing, proposes best practices for developing calibrated prompts, explores various types techniques with use cases, addresses the challenges, including ethical considerations using LLMs simulation. This helps bridge knowledge gap on LLM simulation-based education, offering tailored guidance Examples were created through testing ensure alignment simulation objectives, covering cases such as clinical scenario development, OSCE station creation, simulated person scripting, debriefing facilitation. These provide easy-to-apply methods enhance realism, engagement, educational simulations. Key challenges associated integration, bias, privacy concerns, hallucinations, lack of transparency, need robust oversight evaluation, are discussed alongside unique education. Recommendations provided help craft prompts that align objectives while mitigating these challenges. By insights, contributes valuable, timely seeking leverage generative AI’s capabilities education responsibly.

Язык: Английский

Процитировано

0

Novel Evaluation Metric and Quantified Performance of ChatGPT-4 Patient Management Simulations for Early Clinical Education: Experimental Study (Preprint) DOI Creative Commons
Riley Scherr, Aidin Spina, Allen Dao

и другие.

JMIR Formative Research, Год журнала: 2025, Номер 9, С. e66478 - e66478

Опубликована: Янв. 31, 2025

Abstract Background Case studies have shown ChatGPT can run clinical simulations at the medical student level. However, no data assessed ChatGPT’s reliability in meeting desired simulation criteria such as accuracy, formatting, and robust feedback mechanisms. Objective This study aims to quantify ability consistently follow formatting instructions create for preclinical learners according principles of multimedia educational technology. Methods Using ChatGPT-4 a prevalidated starting prompt, authors ran 360 separate an acute asthma exacerbation. A total 180 were given correct answers incorrect answers. was evaluated its adhere basic parameters (stepwise progression, free response, interactivity), advanced (autonomous conclusion, delayed feedback, comprehensive feedback), accuracy (vignette, treatment updates, feedback). Significance determined with χ ² analyses using 95% CIs odds ratios. Results In total, 100% (n=360) met medically accurate. For parameters, 55% (200/360) all while Correct arm (157/180, 87%) significantly more than Incorrect (43/180, 24%; P <.001). 79% (285/360) concluded autonomously, there difference between arms autonomous conclusion (146/180, 81% 139/180, 77%; =.36). Overall, 78% (282/360) gave (137/180, 76% 145/180, 81%; =.31). not likely conclude autonomously ( =.34) provide =.27) when compared delayed. Conclusions These potential be reliable tool simple by novel 9-part metric. Per this metric, performed perfectly on parameters. It well conclusion. Delayed depended user inputs. one parameter meet Further work must done ensure consistent performance across broader range scenarios.

Язык: Английский

Процитировано

0

A Review of Large Language Models in Medical Education, Clinical Decision Support, and Healthcare Administration DOI Open Access
Josip Vrdoljak, Zvonimir Boban, Marino Vilović

и другие.

Healthcare, Год журнала: 2025, Номер 13(6), С. 603 - 603

Опубликована: Март 10, 2025

Background/Objectives: Large language models (LLMs) have shown significant potential to transform various aspects of healthcare. This review aims explore the current applications, challenges, and future prospects LLMs in medical education, clinical decision support, healthcare administration. Methods: A comprehensive literature was conducted, examining applications across three key domains. The analysis included their performance, advancements, with a focus on techniques like retrieval-augmented generation (RAG). Results: In show promise as virtual patients, personalized tutors, tools for generating study materials. Some outperformed junior trainees specific knowledge assessments. Concerning exhibit diagnostic assistance, treatment recommendations, retrieval, though performance varies specialties tasks. administration, effectively automate tasks note summarization, data extraction, report generation, potentially reducing administrative burdens professionals. Despite promise, challenges persist, including hallucination mitigation, addressing biases, ensuring patient privacy security. Conclusions: transformative medicine but require careful integration into settings. Ethical considerations, regulatory interdisciplinary collaboration between AI developers professionals are essential. Future advancements LLM reliability through such RAG, fine-tuning, reinforcement learning will be critical safety improving delivery.

Язык: Английский

Процитировано

0

Integrating artificial intelligence into pre-clinical medical education: challenges, opportunities, and recommendations DOI Creative Commons

Birgit Pohn,

Lars Mehnen, Sebastian Fitzek

и другие.

Frontiers in Education, Год журнала: 2025, Номер 10

Опубликована: Март 26, 2025

Процитировано

0

Mapping the use of artificial intelligence in medical education: a scoping review DOI Creative Commons
Erwin Hernando Hernández Rincón, Daniel Jiménez,

Liliana Aguilar

и другие.

BMC Medical Education, Год журнала: 2025, Номер 25(1)

Опубликована: Апрель 12, 2025

Язык: Английский

Процитировано

0

Integrating ChatGPT in Orthopedic Education for Medical Undergraduates: Randomized Controlled Trial DOI Creative Commons
Wenyi Gan, Jianfeng Ouyang, Hua Li

и другие.

Journal of Medical Internet Research, Год журнала: 2024, Номер 26, С. e57037 - e57037

Опубликована: Авг. 20, 2024

Background ChatGPT is a natural language processing model developed by OpenAI, which can be iteratively updated and optimized to accommodate the changing complex requirements of human verbal communication. Objective The study aimed evaluate ChatGPT’s accuracy in answering orthopedics-related multiple-choice questions (MCQs) assess its short-term effects as learning aid through randomized controlled trial. In addition, long-term on student performance other subjects were measured using final examination results. Methods We first evaluated MCQs pertaining orthopedics across various question formats. Then, 129 undergraduate medical students participated group used tool, while control was prohibited from artificial intelligence software support learning. Following 2-week intervention, 2 groups’ understanding assessed an test, variations disciplines noted follow-up at end semester. Results ChatGPT-4.0 answered 1051 with 70.60% (742/1051) rate, including 71.8% (237/330) for A1 MCQs, 73.7% (330/448) A2 70.2% (92/131) A3/4 58.5% (83/142) case analysis MCQs. As April 7, 2023, total individuals experiment. However, 19 withdrew experiment phases; thus, July 1, 110 accomplished trial completed all work. After we intervened style short term, more correctly than (ChatGPT group: mean 141.20, SD 26.68; 130.80, 25.56; P=.04) particularly 46.57, 8.52; 42.18, 9.43; P=.01), 60.59, 10.58; 56.66, 9.91; P=.047), 19.57, 5.48; 16.46, 4.58; P=.002). At semester, found that performed better examinations surgery 76.54, 9.79; 72.54, 8.11; P=.02) obstetrics gynecology 75.98, 8.94; 8.66; group. Conclusions answers accurately, it excel both assessments. Our findings strongly integration into education, enhancing contemporary instructional methods. Trial Registration Chinese Clinical Registry Chictr2300071774; https://www.chictr.org.cn/hvshowproject.html ?id=225740&v=1.0

Язык: Английский

Процитировано

3

Evaluating ChatGPT’s moral competence in health care-related ethical problems DOI Creative Commons

Ahmed A Rashid,

Ryan A Skelly,

Carlos A Valdes

и другие.

JAMIA Open, Год журнала: 2024, Номер 7(3)

Опубликована: Июль 1, 2024

Abstract Objectives Artificial intelligence tools such as Chat Generative Pre-trained Transformer (ChatGPT) have been used for many health care-related applications; however, there is a lack of research on their capabilities evaluating morally and/or ethically complex medical decisions. The objective this study was to assess the moral competence ChatGPT. Materials and methods This cross-sectional performed between May 2023 July using scenarios from Moral Competence Test (MCT). Numerical responses were collected ChatGPT 3.5 4.0 individual overall stage scores, including C-index preference. Descriptive analysis 2-sided Student’s t-test all continuous data. Results A total 100 iterations MCT preference found be higher in latter Kohlberg-derived arguments. (2.325 versus 1.755) when compared 3.5. also statistically score comparison (29.03 ± 11.10 19.32 10.95, P =.0000275). Discussion trended towards stages Kohlberg’s theory both dilemmas with C-indices suggesting medium competence. However, models showed moderate variation scores indicating inconsistency further training recommended. Conclusion demonstrates can evaluate arguments based development. These findings suggest that future revisions other large language could assist physicians decision-making process encountering ethical scenarios.

Язык: Английский

Процитировано

1

Capabilities of GPT-4o and Gemini 1.5 Pro in Gram stain and bacterial shape identification DOI Creative Commons
Joya-Rita Hindy, Tarek Souaid, Christopher Kovacs

и другие.

Future Microbiology, Год журнала: 2024, Номер 19(15), С. 1283 - 1292

Опубликована: Июль 29, 2024

Aim: Assessing the visual accuracy of two large language models (LLMs) in microbial classification. Materials & methods: GPT-4o and Gemini 1.5 Pro were evaluated distinguishing Gram-positive from Gram-negative bacteria classifying them as cocci or bacilli using 80 Gram stain images a labeled database. Results: achieved 100% identifying simultaneously shape for Clostridium perfringens, Pseudomonas aeruginosa Staphylococcus aureus. showed more variability similar (45, 100 95%, respectively). Both LLMs failed to identify both bacterial Neisseria gonorrhoeae. Cumulative plots indicated that consistently performed equally better every identification, except gonorrhoeae's shape. Conclusion: These results suggest these their unprimed state are not ready be implemented clinical practice highlight need research with larger datasets improve LLMs' effectiveness microbiology.

Язык: Английский

Процитировано

1