Comparative evaluation of artificial intelligence models GPT-4 and GPT-3.5 in clinical decision-making in sports surgery and physiotherapy: a cross-sectional study DOI Creative Commons
Sönmez Sağlam, Veysel Uludağ, Zekeriya Okan Karaduman

et al.

BMC Medical Informatics and Decision Making, Journal Year: 2025, Volume and Issue: 25(1)

Published: April 14, 2025

The integration of artificial intelligence (AI) in healthcare has rapidly expanded, particularly clinical decision-making. Large language models (LLMs) such as GPT-4 and GPT-3.5 have shown potential various medical applications, including diagnostics treatment planning. However, their efficacy specialized fields like sports surgery physiotherapy remains underexplored. This study aims to compare the performance decision-making within these domains using a structured assessment approach. cross-sectional included 56 professionals specializing physiotherapy. Participants evaluated 10 standardized scenarios generated by 5-point Likert scale. encompassed common musculoskeletal conditions, assessments focused on diagnostic accuracy, appropriateness, surgical technique detailing, rehabilitation plan suitability. Data were collected anonymously via Google Forms. Statistical analysis paired t-tests for direct model comparisons, one-way ANOVA assess across multiple criteria, Cronbach's alpha evaluate inter-rater reliability. significantly outperformed all criteria. Paired t-test results (t(55) = 10.45, p < 0.001) demonstrated that provided more accurate diagnoses, superior plans, detailed recommendations. confirmed higher suitability planning (F(1, 55) 35.22, protocols 32.10, 0.001). values indicated internal consistency (α 0.478) compared 0.234), reflecting reliable performance. demonstrates These findings suggest advanced AI can aid planning, strategies. should function decision-support tool rather than substitute expert judgment. Future studies explore into real-world workflows, validate larger datasets, additional beyond GPT series.

Language: Английский

A guide to artificial intelligence for cancer researchers DOI
Raquel Pérez-López, Narmin Ghaffari Laleh, Faisal Mahmood

et al.

Nature reviews. Cancer, Journal Year: 2024, Volume and Issue: 24(6), P. 427 - 441

Published: May 16, 2024

Language: Английский

Citations

64

Generative artificial intelligence in healthcare: A scoping review on benefits, challenges and applications DOI
Khadijeh Moulaei,

Atiye Yadegari,

Mahdi Baharestani

et al.

International Journal of Medical Informatics, Journal Year: 2024, Volume and Issue: 188, P. 105474 - 105474

Published: May 8, 2024

Language: Английский

Citations

46

Comparative Evaluation of LLMs in Clinical Oncology DOI Creative Commons
Nicholas R. Rydzewski, Deepak Dinakaran, Shuang G. Zhao

et al.

NEJM AI, Journal Year: 2024, Volume and Issue: 1(5)

Published: April 16, 2024

As artificial intelligence (AI) tools become widely accessible, more patients and medical professionals will turn to them for information. Large language models (LLMs), a subset of AI, excel in natural processing tasks hold considerable promise clinical use. Fields such as oncology, which decisions are highly dependent on continuous influx new trial data evolving guidelines, stand gain immensely from advancements. It is therefore critical importance benchmark these describe their performance characteristics guide safe application oncology. Accordingly, the primary objectives this work were conduct comprehensive evaluations LLMs field oncology identify characterize strategies that can use bolster confidence model's response.

Language: Английский

Citations

29

ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives DOI Creative Commons

Pedram Keshavarz,

Sara Bagherieh,

Seyed Ali Nabipoorashrafi

et al.

Diagnostic and Interventional Imaging, Journal Year: 2024, Volume and Issue: 105(7-8), P. 251 - 265

Published: April 27, 2024

The purpose of this study was to systematically review the reported performances ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, ethical considerations in radiology applications.

Language: Английский

Citations

24

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges DOI Creative Commons
Felix Busch, Lena Hoffmann, Christopher Rueger

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: March 5, 2024

Abstract The introduction of large language models (LLMs) into clinical practice promises to improve patient education and empowerment, thereby personalizing medical care broadening access knowledge. Despite the popularity LLMs, there is a significant gap in systematized information on their use care. Therefore, this systematic review aims synthesize current applications limitations LLMs using data-driven convergent synthesis approach. We searched 5 databases for qualitative, quantitative, mixed methods articles published between 2022 2023. From 4,349 initial records, 89 studies across 29 specialties were included, primarily examining based GPT-3.5 (53.2%, n=66 124 different examined per study) GPT-4 (26.6%, n=33/124) architectures question answering, followed by generation, including text summarization or translation, documentation. Our analysis delineates two primary domains LLM limitations: design output. Design included 6 second-order 12 third-order codes, such as lack domain optimization, data transparency, accessibility issues, while output 9 32 example, non-reproducibility, non-comprehensiveness, incorrectness, unsafety, bias. In conclusion, study first systematically map care, providing foundational framework taxonomy implementation evaluation healthcare settings.

Language: Английский

Citations

17

A framework for human evaluation of large language models in healthcare derived from literature review DOI Creative Commons

Thomas Yu Chow Tam,

Sonish Sivarajkumar,

Sumit Kapoor

et al.

npj Digital Medicine, Journal Year: 2024, Volume and Issue: 7(1)

Published: Sept. 28, 2024

Abstract With generative artificial intelligence (GenAI), particularly large language models (LLMs), continuing to make inroads in healthcare, assessing LLMs with human evaluations is essential assuring safety and effectiveness. This study reviews existing literature on evaluation methodologies for healthcare across various medical specialties addresses factors such as dimensions, sample types sizes, selection, recruitment of evaluators, frameworks metrics, process, statistical analysis type. Our review 142 studies shows gaps reliability, generalizability, applicability current practices. To overcome significant obstacles LLM developments deployments, we propose QUEST, a comprehensive practical framework covering three phases workflow: Planning, Implementation Adjudication, Scoring Review. QUEST designed five proposed principles: Quality Information, Understanding Reasoning, Expression Style Persona, Safety Harm, Trust Confidence.

Language: Английский

Citations

17

Current applications and challenges in large language models for patient care: a systematic review DOI Creative Commons
Felix Busch, Lena Hoffmann, Christopher Rueger

et al.

Communications Medicine, Journal Year: 2025, Volume and Issue: 5(1)

Published: Jan. 21, 2025

Abstract Background The introduction of large language models (LLMs) into clinical practice promises to improve patient education and empowerment, thereby personalizing medical care broadening access knowledge. Despite the popularity LLMs, there is a significant gap in systematized information on their use care. Therefore, this systematic review aims synthesize current applications limitations LLMs Methods We systematically searched 5 databases for qualitative, quantitative, mixed methods articles published between 2022 2023. From 4349 initial records, 89 studies across 29 specialties were included. Quality assessment was performed using Mixed Appraisal Tool 2018. A data-driven convergent synthesis approach applied thematic syntheses LLM free line-by-line coding Dedoose. Results show that most investigate Generative Pre-trained Transformers (GPT)-3.5 (53.2%, n = 66 124 different examined) GPT-4 (26.6%, 33/124) answering questions, followed by generation, including text summarization or translation, documentation. Our analysis delineates two primary domains limitations: design output. Design include 6 second-order 12 third-order codes, such as lack domain optimization, data transparency, accessibility issues, while output 9 32 example, non-reproducibility, non-comprehensiveness, incorrectness, unsafety, bias. Conclusions This maps care, providing foundational framework taxonomy implementation evaluation healthcare settings.

Language: Английский

Citations

6

GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial DOI
Ethan Goh, Robert J. Gallo, Eric Strong

et al.

Nature Medicine, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 5, 2025

Language: Английский

Citations

6

Uncovering Language Disparity of ChatGPT on Retinal Vascular Disease Classification: Cross-Sectional Study DOI Creative Commons
Xiaocong Liu, Jiageng Wu, An Shao

et al.

Journal of Medical Internet Research, Journal Year: 2023, Volume and Issue: 26, P. e51926 - e51926

Published: Nov. 30, 2023

Benefiting from rich knowledge and the exceptional ability to understand text, large language models like ChatGPT have shown great potential in English clinical environments. However, performance of non-English settings, as well its reasoning, not been explored depth.

Language: Английский

Citations

33

Augmented non-hallucinating large language models as medical information curators DOI Creative Commons
Stephen Gilbert, Jakob Nikolas Kather, Aidan Hogan

et al.

npj Digital Medicine, Journal Year: 2024, Volume and Issue: 7(1)

Published: April 23, 2024

Reliably processing and interlinking medical information has been recognized as a critical foundation to the digital transformation of workflows, despite development ontologies, optimization these major bottleneck medicine. The advent large language models brought great excitement, maybe solution medicines' 'communication problem' is in sight, but how can known weaknesses models, such hallucination non-determinism, be tempered? Retrieval Augmented Generation, particularly through knowledge graphs, an automated approach that deliver structured reasoning model truth alongside LLMs, relevant structuring therefore also decision support.

Language: Английский

Citations

15