A Systematic Review of Large Language Models in Medical Specialties: Applications, Challenges and Future Directions DOI
Asma Musabah Alkalbani, Ahmed Salim Alrawahi, Ahmad Salah

et al.

Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown

Published: April 16, 2025

Abstract Background: Large Language Models (LLMs) are one of the artificial intelligence (AI) technologies used to understand and generate text, summarize information, comprehend contextual cues. LLMs have been increasingly by researchers in various medical applications, but their effectiveness limitations still uncertain, especially across specialties. Objective: This review evaluates recent literature on how utilized research studies 19 It also explores challenges involved suggests areas for future focus. Methods: Two performed searches PubMed, Web Science Scopus identify published from January 2021 March 2024. The included usage LLM performing tasks. Data was extracted analyzed five reviewers. To assess risk bias, quality assessment using revised tool intelligence-centered diagnostic accuracy (QUADAS-AI). Results: Results were synthesized through categorical analysis evaluation metrics, impact types, validation approaches A total 84 this mainly originated two countries; USA (35/84) China (16/84). Although reviewed applications spread specialties, multi-specialty demonstrated 22 studies. Various aims include clinical natural language processing (31/84), supporting decision (20/84), education (15/84), diagnoses patient management engagement (3/84). GPT-based BERT-based most (83/84) Despite reported positive impacts such as improved efficiency accuracy, related reliability, ethics remain. overall bias low 72 studies, high 11 not clear 3 Conclusion: dominate specialty with over 98.8% these models. potential benefits process diagnostics, a key finding regarding substantial variability performance among LLMs. For instance, LLMs' ranged 3% support 90% some NLP Heterogeneity utilization diverse tasks contexts prevented meaningful meta-analysis, lacked standardized methodologies, outcome measures, implementation approaches. Therefore, room improvement remains wide developing domain-specific data establishing standards ensure reliability effectiveness.

Language: Английский

Evaluating large language models and agents in healthcare: key challenges in clinical applications DOI Creative Commons
Xiaolan Chen, Jie Xiang,

Shanfu Lu

et al.

Intelligent Medicine, Journal Year: 2025, Volume and Issue: unknown

Published: March 1, 2025

Language: Английский

Citations

0

A Systematic Review of Large Language Models in Medical Specialties: Applications, Challenges and Future Directions DOI
Asma Musabah Alkalbani, Ahmed Salim Alrawahi, Ahmad Salah

et al.

Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown

Published: April 16, 2025

Abstract Background: Large Language Models (LLMs) are one of the artificial intelligence (AI) technologies used to understand and generate text, summarize information, comprehend contextual cues. LLMs have been increasingly by researchers in various medical applications, but their effectiveness limitations still uncertain, especially across specialties. Objective: This review evaluates recent literature on how utilized research studies 19 It also explores challenges involved suggests areas for future focus. Methods: Two performed searches PubMed, Web Science Scopus identify published from January 2021 March 2024. The included usage LLM performing tasks. Data was extracted analyzed five reviewers. To assess risk bias, quality assessment using revised tool intelligence-centered diagnostic accuracy (QUADAS-AI). Results: Results were synthesized through categorical analysis evaluation metrics, impact types, validation approaches A total 84 this mainly originated two countries; USA (35/84) China (16/84). Although reviewed applications spread specialties, multi-specialty demonstrated 22 studies. Various aims include clinical natural language processing (31/84), supporting decision (20/84), education (15/84), diagnoses patient management engagement (3/84). GPT-based BERT-based most (83/84) Despite reported positive impacts such as improved efficiency accuracy, related reliability, ethics remain. overall bias low 72 studies, high 11 not clear 3 Conclusion: dominate specialty with over 98.8% these models. potential benefits process diagnostics, a key finding regarding substantial variability performance among LLMs. For instance, LLMs' ranged 3% support 90% some NLP Heterogeneity utilization diverse tasks contexts prevented meaningful meta-analysis, lacked standardized methodologies, outcome measures, implementation approaches. Therefore, room improvement remains wide developing domain-specific data establishing standards ensure reliability effectiveness.

Language: Английский

Citations

0