A Systematic Review of Large Language Models in Medical Specialties: Applications, Challenges and Future Directions DOI
Asma Musabah Alkalbani, Ahmed Salim Alrawahi, Ahmad Salah

et al.

Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown

Published: April 16, 2025

Abstract Background: Large Language Models (LLMs) are one of the artificial intelligence (AI) technologies used to understand and generate text, summarize information, comprehend contextual cues. LLMs have been increasingly by researchers in various medical applications, but their effectiveness limitations still uncertain, especially across specialties. Objective: This review evaluates recent literature on how utilized research studies 19 It also explores challenges involved suggests areas for future focus. Methods: Two performed searches PubMed, Web Science Scopus identify published from January 2021 March 2024. The included usage LLM performing tasks. Data was extracted analyzed five reviewers. To assess risk bias, quality assessment using revised tool intelligence-centered diagnostic accuracy (QUADAS-AI). Results: Results were synthesized through categorical analysis evaluation metrics, impact types, validation approaches A total 84 this mainly originated two countries; USA (35/84) China (16/84). Although reviewed applications spread specialties, multi-specialty demonstrated 22 studies. Various aims include clinical natural language processing (31/84), supporting decision (20/84), education (15/84), diagnoses patient management engagement (3/84). GPT-based BERT-based most (83/84) Despite reported positive impacts such as improved efficiency accuracy, related reliability, ethics remain. overall bias low 72 studies, high 11 not clear 3 Conclusion: dominate specialty with over 98.8% these models. potential benefits process diagnostics, a key finding regarding substantial variability performance among LLMs. For instance, LLMs' ranged 3% support 90% some NLP Heterogeneity utilization diverse tasks contexts prevented meaningful meta-analysis, lacked standardized methodologies, outcome measures, implementation approaches. Therefore, room improvement remains wide developing domain-specific data establishing standards ensure reliability effectiveness.

Language: Английский

Assessment of artificial intelligence applications in responding to dental trauma DOI
İdil Özden, Merve Gökyar, Mustafa Özden

et al.

Dental Traumatology, Journal Year: 2024, Volume and Issue: 40(6), P. 722 - 729

Published: May 14, 2024

This study assessed the consistency and accuracy of responses provided by two artificial intelligence (AI) applications, ChatGPT Google Bard (Gemini), to questions related dental trauma.

Language: Английский

Citations

24

Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development and Usability Study DOI Creative Commons
Fadi Aljamaan, Mohamad‐Hani Temsah, Ibraheem Altamimi

et al.

JMIR Medical Informatics, Journal Year: 2024, Volume and Issue: 12, P. e54345 - e54345

Published: July 3, 2024

Artificial intelligence (AI) chatbots have recently gained use in medical practice by health care practitioners. Interestingly, the output of these AI was found to varying degrees hallucination content and references. Such hallucinations generate doubts about their implementation.

Language: Английский

Citations

24

Improving readability in AI-generated medical information on fragility fractures: the role of prompt wording on ChatGPT’s responses DOI
Hakan Akkan, Gülce Kallem Seyyar

Osteoporosis International, Journal Year: 2025, Volume and Issue: 36(3), P. 403 - 410

Published: Jan. 8, 2025

Language: Английский

Citations

2

Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity DOI Creative Commons
Ceren Durmaz Engin,

Ezgi Karatas,

Taylan Öztürk

et al.

Children, Journal Year: 2024, Volume and Issue: 11(6), P. 750 - 750

Published: June 20, 2024

Large language models (LLMs) are becoming increasingly important as they being used more frequently for providing medical information. Our aim is to evaluate the effectiveness of electronic artificial intelligence (AI) large (LLMs), such ChatGPT-4, BingAI, and Gemini in responding patient inquiries about retinopathy prematurity (ROP).

Language: Английский

Citations

12

Large language models in patient education: a scoping review of applications in medicine DOI Creative Commons
Serhat Aydın, Mert Karabacak,

Victoria Vlachos

et al.

Frontiers in Medicine, Journal Year: 2024, Volume and Issue: 11

Published: Oct. 29, 2024

Large Language Models (LLMs) are sophisticated algorithms that analyze and generate vast amounts of textual data, mimicking human communication. Notable LLMs include GPT-4o by Open AI, Claude 3.5 Sonnet Anthropic, Gemini Google. This scoping review aims to synthesize the current applications potential uses in patient education engagement.

Language: Английский

Citations

9

De novo generation of colorectal patient educational materials using large language models: Prompt engineering key to improved readability DOI

India E. Ellison,

Wendelyn M. Oslock,

Abiha Abdullah

et al.

Surgery, Journal Year: 2025, Volume and Issue: 180, P. 109024 - 109024

Published: Jan. 4, 2025

Language: Английский

Citations

1

Adopting artificial intelligence for health information literacy: A literature review DOI
Godwin Dzangare,

Thabo Ayibongwe Gulu

Information Development, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 25, 2025

Purpose – Artificial Intelligence (AI) is increasingly becoming a popular source of information, including health information. It essential to explore the adoption AI achieve Health Information Literacy (HIL) and ensure that users maximise use This study explores AI's in advancing HIL. identifies gaps, concerns, challenges suggests areas where could be improved. Approach The retrieved papers were initially assessed based on title abstract inclusion criteria. full text relevant was verified following exclusion Additionally, comprehensive assessment reference lists included performed. extracted from selected articles, bibliometric thematic analysis applied for thorough examination. Methodology Key details about author, publication year, type, purpose, key findings, collected using standardised format. As themes emerged, information publications address main research questions. All articles reviewed English published between 2019 2024. Findings growing HIL can accounted by growth 128.13% publications. However, concerns must addressed as continuous guaranteed. Originality likely first assess current findings will provide clear landscape investing, identifying partners, providing gap.

Language: Английский

Citations

1

Enhancing Health Literacy: Evaluating the Readability of Patient Handouts Revised by ChatGPT's Large Language Model DOI
Austin R. Swisher, Arthur W. Wu,

Godfrey K. F. Liu

et al.

Otolaryngology, Journal Year: 2024, Volume and Issue: 171(6), P. 1751 - 1757

Published: Aug. 6, 2024

To use an artificial intelligence (AI)-powered large language model (LLM) to improve readability of patient handouts.

Language: Английский

Citations

8

Artificial Intelligence in Hand Surgery – How Generative AI is Transforming the Hand Surgery Landscape DOI
Ruth En Si Tan, Wendy Z. W. Teo, Mark E. Puhaindran

et al.

The Journal of Hand Surgery (Asian-Pacific Volume), Journal Year: 2024, Volume and Issue: 29(02), P. 81 - 87

Published: March 26, 2024

Artificial intelligence (AI) has witnessed significant advancements, reshaping various industries, including healthcare. The introduction of ChatGPT by OpenAI in November 2022 marked a pivotal moment, showcasing the potential generative AI revolutionising patient care, diagnosis and treatment. Generative AI, unlike traditional systems, possesses ability to generate new content understanding patterns within datasets. This article explores evolution healthcare, tracing its roots term coined John McCarthy 1955 contributions pioneers like Von Neumann Alan Turing. Currently, particularly Large Language Models, holds promise across three broad categories healthcare: education research. In it offers solutions clinical document management, diagnostic support operative planning. Notable advancements include Microsoft’s collaboration with Epic for integrating into electronic medical records (EMRs), enhancing data management care. Furthermore, aids surgical decision-making, as demonstrated plastic, orthopaedic hepatobiliary surgeries. However, challenges such bias, hallucination integration EMR systems necessitate caution ongoing evaluation. also presents insights from implementation NUHS Russell-GPT, chatbot, hand surgery department, utility administrative tasks but highlighting planning integration. survey showed unanimous incorporating settings, all respondents being open use. conclusion, is poised enhance care ease physician workloads, starting automating evolving inform diagnoses, tailored treatment plans, well aid As healthcare navigate complexities benefits both physicians patients remain significant, offering glimpse future where transforms delivery. Level Evidence: V (Diagnostic)

Language: Английский

Citations

7

Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study DOI
Yeliz Güven, Ömer Tarık Özdemir, Melis Yazır Kavan

et al.

Dental Traumatology, Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 22, 2024

ABSTRACT Background/Aim Artificial intelligence (AI) chatbots have become increasingly prevalent in recent years as potential sources of online healthcare information for patients when making medical/dental decisions. This study assessed the readability, quality, and accuracy responses provided by three AI to questions related traumatic dental injuries (TDIs), either retrieved from popular question‐answer sites or manually created based on hypothetical case scenarios. Materials Methods A total 59 injury queries were directed at ChatGPT 3.5, 4.0, Google Gemini. Readability was evaluated using Flesch Reading Ease (FRE) Flesch–Kincaid Grade Level (FKGL) scores. To assess response quality accuracy, DISCERN tool, Global Quality Score (GQS), misinformation scores used. The understandability actionability analyzed Patient Education Assessment Tool Printed (PEMAT‐P) tool. Statistical analysis included Kruskal–Wallis with Dunn's post hoc test non‐normal variables, one‐way ANOVA Tukey's normal variables ( p < 0.05). Results mean FKGL FRE Gemini 11.2 49.25, 11.8 46.42, 10.1 51.91, respectively, indicating that difficult read required a college‐level reading ability. 3.5 had lowest PEMAT‐P among 0.001). 4.0 rated higher (GQS score 5) compared Conclusions In this study, although widely used, some misleading inaccurate about TDIs. contrast, generated more accurate comprehensive answers, them reliable auxiliary sources. However, complex issues like TDIs, no chatbot can replace dentist diagnosis, treatment, follow‐up care.

Language: Английский

Citations

7