Comparative evaluation of artificial intelligence models GPT-4 and GPT-3.5 in clinical decision-making in sports surgery and physiotherapy: a cross-sectional study DOI Creative Commons
Sönmez Sağlam, Veysel Uludağ, Zekeriya Okan Karaduman

et al.

BMC Medical Informatics and Decision Making, Journal Year: 2025, Volume and Issue: 25(1)

Published: April 14, 2025

The integration of artificial intelligence (AI) in healthcare has rapidly expanded, particularly clinical decision-making. Large language models (LLMs) such as GPT-4 and GPT-3.5 have shown potential various medical applications, including diagnostics treatment planning. However, their efficacy specialized fields like sports surgery physiotherapy remains underexplored. This study aims to compare the performance decision-making within these domains using a structured assessment approach. cross-sectional included 56 professionals specializing physiotherapy. Participants evaluated 10 standardized scenarios generated by 5-point Likert scale. encompassed common musculoskeletal conditions, assessments focused on diagnostic accuracy, appropriateness, surgical technique detailing, rehabilitation plan suitability. Data were collected anonymously via Google Forms. Statistical analysis paired t-tests for direct model comparisons, one-way ANOVA assess across multiple criteria, Cronbach's alpha evaluate inter-rater reliability. significantly outperformed all criteria. Paired t-test results (t(55) = 10.45, p < 0.001) demonstrated that provided more accurate diagnoses, superior plans, detailed recommendations. confirmed higher suitability planning (F(1, 55) 35.22, protocols 32.10, 0.001). values indicated internal consistency (α 0.478) compared 0.234), reflecting reliable performance. demonstrates These findings suggest advanced AI can aid planning, strategies. should function decision-support tool rather than substitute expert judgment. Future studies explore into real-world workflows, validate larger datasets, additional beyond GPT series.

Language: Английский

Evaluating the Performance and Safety of Large Language Models in Generating Type 2 Diabetes Mellitus Management Plans: A Comparative Study With Physicians Using Real Patient Records DOI Open Access
Agnibho Mondal, Arindam Naskar,

Bhaskar Roy Choudhury

et al.

Cureus, Journal Year: 2025, Volume and Issue: unknown

Published: March 17, 2025

Background The integration of large language models (LLMs) such as GPT-4 into healthcare presents potential benefits and challenges. While LLMs show promise in applications ranging from scientific writing to personalized medicine, their practical utility safety clinical settings remain under scrutiny. Concerns about accuracy, ethical considerations, bias necessitate rigorous evaluation these technologies against established medical standards. Methods This study involved a comparative analysis using anonymized patient records setting the state West Bengal, India. Management plans for 50 patients with type 2 diabetes mellitus were generated by three physicians, who blinded each other's responses. These evaluated reference management plan based on American Diabetes Society guidelines. Completeness, necessity, dosage accuracy quantified Prescribing Error Score was devised assess quality plans. also assessed. Results indicated that physicians' had fewer missing medications compared those (p=0.008). However, GPT-4-generated included unnecessary (p=0.003). No significant difference observed drug dosages (p=0.975). overall error scores comparable between physicians (p=0.301). Safety issues noted 16% GPT-4, highlighting risks associated AI-generated Conclusion demonstrates while can effectively reduce prescriptions, it does not yet match performance terms completeness. findings support use supplementary tools healthcare, need enhanced algorithms continuous human oversight ensure efficacy artificial intelligence settings.

Language: Английский

Citations

0

Evaluation of a Retrieval-Augmented Generation-Powered Chatbot for Pre-CT Informed Consent: a Prospective Comparative Study DOI Creative Commons
Felix Busch,

Lukas Kaibel,

Hai Nguyen

et al.

Deleted Journal, Journal Year: 2025, Volume and Issue: unknown

Published: March 21, 2025

Abstract This study aims to investigate the feasibility, usability, and effectiveness of a Retrieval-Augmented Generation (RAG)-powered Patient Information Assistant (PIA) chatbot for pre-CT information counseling compared standard physician consultation informed consent process. prospective comparative included 86 patients scheduled CT imaging between November December 2024. Patients were randomly assigned either PIA group ( n = 43), who received via chat app, or control with doctor-led consultation. satisfaction, clarity comprehension, concerns assessed using six ten-point Likert-scale questions after doctor’s Additionally, duration was measured, asked about their preference consultation, while two radiologists rated each in five categories. Both groups reported similarly high ratings (PIA: 8.64 ± 1.69; control: 8.86 1.28; p 0.82) overall comprehension 8.81 1.40; 8.93 1.61; 0.35). However, doctor showed greater alleviating patient (8.30 2.63 versus 6.46 3.29; 0.003). The demonstrated significantly shorter subsequent times (median: 120 s [interquartile range (IQR): 100–140] 195 [IQR: 170–220]; 0.04). quality, scientific clinical evidence, usefulness relevance, consistency, up-to-dateness high. RAG-powered effectively provided reducing time. While both methods achieved comparable satisfaction physicians more effective at addressing worries regarding examination.

Language: Английский

Citations

0

An Assessment of the Accuracy and Consistency of ChatGPT in the Management of Midshaft Clavicle Fractures DOI Open Access
Christopha J Knee, Ryan Campbell, Brahman Sivakumar

et al.

Cureus, Journal Year: 2025, Volume and Issue: unknown

Published: April 8, 2025

Language: Английский

Citations

0

Large Language Model Use Cases in Healthcare Research are Redundant and Often Lack Appropriate Methodological Conduct: A Scoping Review and Call for Improved Practices DOI
Kyle N. Kunze, Cameron Gerhold, Udit Dave

et al.

Arthroscopy The Journal of Arthroscopic and Related Surgery, Journal Year: 2025, Volume and Issue: unknown

Published: April 1, 2025

Language: Английский

Citations

0

Comparative evaluation of artificial intelligence models GPT-4 and GPT-3.5 in clinical decision-making in sports surgery and physiotherapy: a cross-sectional study DOI Creative Commons
Sönmez Sağlam, Veysel Uludağ, Zekeriya Okan Karaduman

et al.

BMC Medical Informatics and Decision Making, Journal Year: 2025, Volume and Issue: 25(1)

Published: April 14, 2025

The integration of artificial intelligence (AI) in healthcare has rapidly expanded, particularly clinical decision-making. Large language models (LLMs) such as GPT-4 and GPT-3.5 have shown potential various medical applications, including diagnostics treatment planning. However, their efficacy specialized fields like sports surgery physiotherapy remains underexplored. This study aims to compare the performance decision-making within these domains using a structured assessment approach. cross-sectional included 56 professionals specializing physiotherapy. Participants evaluated 10 standardized scenarios generated by 5-point Likert scale. encompassed common musculoskeletal conditions, assessments focused on diagnostic accuracy, appropriateness, surgical technique detailing, rehabilitation plan suitability. Data were collected anonymously via Google Forms. Statistical analysis paired t-tests for direct model comparisons, one-way ANOVA assess across multiple criteria, Cronbach's alpha evaluate inter-rater reliability. significantly outperformed all criteria. Paired t-test results (t(55) = 10.45, p < 0.001) demonstrated that provided more accurate diagnoses, superior plans, detailed recommendations. confirmed higher suitability planning (F(1, 55) 35.22, protocols 32.10, 0.001). values indicated internal consistency (α 0.478) compared 0.234), reflecting reliable performance. demonstrates These findings suggest advanced AI can aid planning, strategies. should function decision-support tool rather than substitute expert judgment. Future studies explore into real-world workflows, validate larger datasets, additional beyond GPT series.

Language: Английский

Citations

0