Cited by Examining the Role of Large Language Models in Orthopedics: Systematic Review (Preprint)

A framework for human evaluation of large language models in healthcare derived from literature review DOI

Thomas Yu Chow Tam,

Sonish Sivarajkumar,

Sumit Kapoor

et al.

npj Digital Medicine, Journal Year: 2024, Volume and Issue: 7(1)

Published: Sept. 28, 2024

Abstract With generative artificial intelligence (GenAI), particularly large language models (LLMs), continuing to make inroads in healthcare, assessing LLMs with human evaluations is essential assuring safety and effectiveness. This study reviews existing literature on evaluation methodologies for healthcare across various medical specialties addresses factors such as dimensions, sample types sizes, selection, recruitment of evaluators, frameworks metrics, process, statistical analysis type. Our review 142 studies shows gaps reliability, generalizability, applicability current practices. To overcome significant obstacles LLM developments deployments, we propose QUEST, a comprehensive practical framework covering three phases workflow: Planning, Implementation Adjudication, Scoring Review. QUEST designed five proposed principles: Quality Information, Understanding Reasoning, Expression Style Persona, Safety Harm, Trust Confidence.

Language: Английский

Citations

Large language models in patient education: a scoping review of applications in medicine DOI

Serhat Aydın, Mert Karabacak,

Victoria Vlachos

et al.

Frontiers in Medicine, Journal Year: 2024, Volume and Issue: 11

Published: Oct. 29, 2024

Large Language Models (LLMs) are sophisticated algorithms that analyze and generate vast amounts of textual data, mimicking human communication. Notable LLMs include GPT-4o by Open AI, Claude 3.5 Sonnet Anthropic, Gemini Google. This scoping review aims to synthesize the current applications potential uses in patient education engagement.

Language: Английский

Citations

Assessment of the Quality and Readability of Information Provided by ChatGPT in Relation to the Use of Platelet-Rich Plasma Therapy for Osteoarthritis DOI

Stephen Fahy, Marcel Niemann,

Peter Böhm

et al.

Journal of Personalized Medicine, Journal Year: 2024, Volume and Issue: 14(5), P. 495 - 495

Published: May 8, 2024

Objective: This study aimed to evaluate the quality and readability of information generated by ChatGPT versions 3.5 4 concerning platelet-rich plasma (PRP) therapy in management knee osteoarthritis (OA), exploring whether large language models (LLMs) could play a significant role patient education. Design: A total 23 common queries regarding PRP OA were presented 4. The responses was assessed using DISCERN criteria, evaluated six established assessment tools. Results: Both produced moderate information. provided version significantly better than 3.5, with mean scores 48.74 44.59, respectively. scored highly respect response relevance had consistent emphasis on importance shared decision making. However, both content above recommended 8th grade reading level for education materials (PEMs), levels (RGLs) 17.18 16.36 4, indicating potential barrier their utility Conclusions: While demonstrated capability generate OA, remains widespread usage, exceeding PEMs. Although showed improvements source citation, future iterations must focus producing more accessible serve as viable resource Collaboration between healthcare providers, organizations, AI developers is crucial ensure generation high quality, peer reviewed, easily understandable that supports informed decisions.

Language: Английский

Citations

Patient education strategies in pediatric orthopaedics: using ChatGPT to answer frequently asked questions on scoliosis DOI

Brigitte Lieu,

E. David Crawford,

Logan Laubach

et al.

Spine Deformity, Journal Year: 2025, Volume and Issue: unknown

Published: April 5, 2025

Language: Английский

Citations

Artificial intelligence and machine learning in knee arthroplasty DOI

Hugo C. Rodriguez,

Brandon Rust,

Martin W. Roche

et al.

The Knee, Journal Year: 2025, Volume and Issue: 54, P. 28 - 49

Published: Feb. 28, 2025

Language: Английский

Citations

Response to “ChatGPT to answer frequently asked questions on scoliosis: comment” DOI

Brigitte Lieu,

E. David Crawford,

Logan Laubach

et al.

Spine Deformity, Journal Year: 2025, Volume and Issue: unknown

Published: May 14, 2025

Language: Английский

Citations

A Review of Medical Ethics in Orthopaedic Surgery DOI

Ryan Lam, Zhi Mei Sonia He, Ruhi Thapar

et al.

Journal of Bone and Joint Surgery, Journal Year: 2025, Volume and Issue: unknown

Published: May 22, 2025

➢ Medical ethics education is a required component of orthopaedic surgery resident training per the Accreditation Council for Graduate Education (ACGME) guidelines, although no standardized curriculum currently exists. Beyond 4 principles bioethics (autonomy, beneficence, nonmaleficence, justice), additional ethical concepts relevant to care include utilitarianism, deontology, virtue ethics, moral intuitionism, microethics, and narrative ethics. Ethical themes identified in literature involved medical decision-making, use new technologies, caring vulnerable patients, performing high-stakes procedures, impacts trainee status on patient care, attitude regarding conflict interest. that we sought identify but found lacking providing low-resource settings, orthopaedics entrepreneurship, disability mistreatment by their supervisors, recognition reporting child elder abuse.

Language: Английский

Citations

Enhancing responses from large language models with role-playing prompts: a comparative study on answering frequently asked questions about total knee arthroplasty DOI

Yi‐Chen Chen,

S. Lee,

Huan Sheu

et al.

BMC Medical Informatics and Decision Making, Journal Year: 2025, Volume and Issue: 25(1)

Published: May 23, 2025

The application of artificial intelligence (AI) in medical education and patient interaction is rapidly growing. Large language models (LLMs) such as GPT-3.5, GPT-4, Google Gemini, Claude 3 Opus have shown potential providing relevant information. This study aims to evaluate compare the performance these LLMs answering frequently asked questions (FAQs) about Total Knee Arthroplasty (TKA), with a specific focus on impact role-playing prompts. Four leading LLMs-GPT-3.5, Opus-were evaluated using ten standardized inquiries related TKA. Each model produced two distinct responses per question: one generated under zero-shot prompting (question-only), (instructed simulate an experienced orthopaedic surgeon). surgeons for accuracy comprehensiveness 5-point Likert scale, along binary measure acceptability. Statistical analyses (Wilcoxon rank sum Chi-squared tests; P < 0.05) were conducted performance. ChatGPT-4 prompts achieved highest scores (3.73), (4.05), acceptability (77.5%), followed closely by ChatGPT-3.5 (3.70, 3.85, 72.5%, respectively). Gemini demonstrated lower across all metrics. In between-model comparisons based prompting, significantly higher both relative (P = 0.031 0.009, respectively) 0.019 0.002), than 0.006). Within-model showed improved metrics 0.033). No significant effects observed or Claude. demonstrates that enhance LLMs, particularly ChatGPT-4, FAQs prompts, superior terms accuracy, comprehensiveness, Despite occasional inaccuracies, hold promise improving clinical decision-making practice. Not applicable.

Language: Английский

Citations

Evaluating AI-Generated informed consent documents in oral surgery: A comparative study of ChatGPT-4, Bard gemini advanced, and human-written consents DOI

Luigi Angelo Vaira, Jérôme R. Lechien, Antonino Maniaci

et al.

Journal of Cranio-Maxillofacial Surgery, Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 1, 2024

Language: Английский

Citations

Examining the Role of Large Language Models in Orthopedics: Systematic Review DOI

Cheng Zhang, Shanshan Liu, Xingyu Zhou

et al.

Journal of Medical Internet Research, Journal Year: 2024, Volume and Issue: 26, P. e59607 - e59607

Published: Nov. 15, 2024

Background Large language models (LLMs) can understand natural and generate corresponding text, images, even videos based on prompts, which holds great potential in medical scenarios. Orthopedics is a significant branch of medicine, orthopedic diseases contribute to socioeconomic burden, could be alleviated by the application LLMs. Several pioneers orthopedics have conducted research LLMs across various subspecialties explore their performance addressing different issues. However, there are currently few reviews summaries these studies, systematic summary existing absent. Objective The objective this review was comprehensively summarize findings field opportunities challenges. Methods PubMed, Embase, Cochrane Library databases were searched from January 1, 2014, February 22, 2024, with limited English. terms, included variants “large model,” “generative artificial intelligence,” “ChatGPT,” “orthopaedics,” divided into 2 categories: large model orthopedics. After completing search, study selection process according inclusion exclusion criteria. quality studies assessed using revised risk-of-bias tool for randomized trials CONSORT-AI (Consolidated Standards Reporting Trials–Artificial Intelligence) guidance. Data extraction synthesis after assessment. Results A total 68 selected. involved fields clinical practice, education, research, management. Of 47 (69%) focused 12 (18%) addressed 8 (12%) related scientific 1 (1%) pertained only recruited patients, high-quality controlled trial. ChatGPT most commonly mentioned LLM tool. There considerable heterogeneity definition, measurement, evaluation LLMs’ studies. For diagnostic tasks alone, accuracy ranged 55% 93%. When performing disease classification tasks, GPT-4’s 2% 100%. With regard answering questions examinations, scores 45% 73.6% due differences test selections. Conclusions cannot replace professionals short term. as copilots approach effectively enhance work efficiency at present. More needed future, aiming identify optimal applications advance toward higher precision.

Language: Английский

Citations