Assessing Large Language Models for Oncology Data Inference From Radiology Reports DOI
L. Chen, Travis Zack,

Arda Demirci

et al.

JCO Clinical Cancer Informatics, Journal Year: 2024, Volume and Issue: 8

Published: Dec. 1, 2024

We examined the effectiveness of proprietary and open large language models (LLMs) in detecting disease presence, location, treatment response pancreatic cancer from radiology reports.

Language: Английский

A future role for health applications of large language models depends on regulators enforcing safety standards DOI Creative Commons
Oscar Freyer, Isabella C. Wiest, Jakob Nikolas Kather

et al.

The Lancet Digital Health, Journal Year: 2024, Volume and Issue: 6(9), P. e662 - e672

Published: Aug. 23, 2024

Among the rapid integration of artificial intelligence in clinical settings, large language models (LLMs), such as Generative Pre-trained Transformer-4, have emerged multifaceted tools that potential for health-care delivery, diagnosis, and patient care. However, deployment LLMs raises substantial regulatory safety concerns. Due to their high output variability, poor inherent explainability, risk so-called AI hallucinations, LLM-based applications serve a medical purpose face challenges approval devices under US EU laws, including recently passed Artificial Intelligence Act. Despite unaddressed risks patients, misdiagnosis unverified advice, are available on market. The ambiguity surrounding these creates an urgent need frameworks accommodate unique capabilities limitations. Alongside development frameworks, existing regulations should be enforced. If regulators fear enforcing market dominated by supply or technology companies, consequences layperson harm will force belated action, damaging potentiality advice.

Language: Английский

Citations

26

Large Language Models lack essential metacognition for reliable medical reasoning DOI Creative Commons

Maxime Griot,

Coralie Hemptinne, Jean Vanderdonckt

et al.

Nature Communications, Journal Year: 2025, Volume and Issue: 16(1)

Published: Jan. 14, 2025

Language: Английский

Citations

3

Large language models improve the identification of emergency department visits for symptomatic kidney stones DOI Creative Commons
Cosmin A. Bejan, Amy E. McCart Reed,

Matthew Mikula

et al.

Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)

Published: Jan. 28, 2025

Recent advancements of large language models (LLMs) like generative pre-trained transformer 4 (GPT-4) have generated significant interest among the scientific community. Yet, potential these to be utilized in clinical settings remains largely unexplored. In this study, we investigated abilities multiple LLMs and traditional machine learning analyze emergency department (ED) reports determine if corresponding visits were due symptomatic kidney stones. Leveraging a dataset manually annotated ED reports, developed strategies enhance including prompt optimization, zero- few-shot prompting, fine-tuning, augmentation. Further, implemented fairness assessment bias mitigation methods investigate disparities by with respect race gender. A expert assessed explanations GPT-4 for its predictions they sound, factually correct, unrelated input prompt, or potentially harmful. The best results achieved (macro-F1 = 0.833, 95% confidence interval [CI] 0.826–0.841) GPT-3.5 0.796, CI 0.796–0.796). Ablation studies revealed that initial model benefits from fine-tuning. Adding demographic information prior disease history prompts allows make better decisions. Bias found exhibited no racial gender disparities, contrast GPT-3.5, which failed effectively diversity.

Language: Английский

Citations

2

The Large Language Model ChatGPT-4 Demonstrates Excellent Triage Capabilities and Diagnostic Performance for Patients Presenting with Various Causes of Knee Pain DOI
Kyle N. Kunze, Nathan H. Varady, Michael Mazzucco

et al.

Arthroscopy The Journal of Arthroscopic and Related Surgery, Journal Year: 2024, Volume and Issue: unknown

Published: June 1, 2024

Language: Английский

Citations

14

Superhuman performance on urology board questions using an explainable language model enhanced with European Association of Urology guidelines DOI Creative Commons

Martin J. Hetz,

Nicolas Carl,

Sarah Haggenmüller

et al.

ESMO Real World Data and Digital Oncology, Journal Year: 2024, Volume and Issue: 6, P. 100078 - 100078

Published: Oct. 4, 2024

Language: Английский

Citations

7

Fine-Tuning LLMs for Specialized Use Cases DOI Creative Commons
D. M. Anisuzzaman, Jeffrey G. Malins, Paul A. Friedman

et al.

Mayo Clinic Proceedings Digital Health, Journal Year: 2024, Volume and Issue: 3(1), P. 100184 - 100184

Published: Nov. 29, 2024

Large language models (LLMs) are a type of artificial intelligence, which operate by predicting and assembling sequences words that statistically likely to follow from given text input. With this basic ability, LLMs able answer complex questions extremely instructions. Products created using such as ChatGPT OpenAI Claude Anthropic have huge amount traction user engagements revolutionized the way we interact with technology, bringing new dimension human-computer interaction. Fine-tuning is process in pretrained model, an LLM, further trained on custom data set adapt it for specialized tasks or domains. In review, outline some major methodologic approaches techniques can be used fine-tune use cases enumerate general steps required carrying out LLM fine-tuning. We then illustrate few these describing several specific fine-tuning across medical subspecialties. Finally, close consideration benefits limitations associated cases, emphasis concerns field medicine.

Language: Английский

Citations

7

Use of Artificial Intelligence for Liver Diseases: A Survey from the EASL Congress 2024 DOI Creative Commons
Laura Žigutytė, Thomas Sorz, Jan Clusmann

et al.

JHEP Reports, Journal Year: 2024, Volume and Issue: 6(12), P. 101209 - 101209

Published: Sept. 6, 2024

Language: Английский

Citations

4

Generative AI Chatbots for Reliable Cancer Information: Evaluating web-search, multilingual, and reference capabilities of emerging large language models DOI Creative Commons
Bradley D. Menz, Natansh D. Modi, Ahmad Y. Abuhelwa

et al.

European Journal of Cancer, Journal Year: 2025, Volume and Issue: 218, P. 115274 - 115274

Published: Feb. 4, 2025

Recent advancements in large language models (LLMs) enable real-time web search, improved referencing, and multilingual support, yet ensuring they provide safe health information remains crucial. This perspective evaluates seven publicly accessible LLMs-ChatGPT, Co-Pilot, Gemini, MetaAI, Claude, Grok, Perplexity-on three simple cancer-related queries across eight languages (336 responses: English, French, Chinese, Thai, Hindi, Nepali, Vietnamese, Arabic). None of the 42 English responses contained clinically meaningful hallucinations, whereas 7 294 non-English did. 48 % (162/336) included valid references, but 39 references were.com links reflecting quality concerns. frequently exceeded an eighth-grade level, many outputs were also complex. These findings reflect substantial progress over past 2-years reveal persistent gaps accuracy, reliable reference inclusion, referral practices, readability. Ongoing benchmarking is essential to ensure LLMs safely support global dichotomy meet online standards.

Language: Английский

Citations

0

Enhancing healthcare resource allocation through large language models DOI
Fang Wan, Kezhi Wang, Tao Wang

et al.

Swarm and Evolutionary Computation, Journal Year: 2025, Volume and Issue: 94, P. 101859 - 101859

Published: Feb. 5, 2025

Language: Английский

Citations

0

Comparative evaluation and performance of large language models on expert level critical care questions: a benchmark study DOI Creative Commons
Jessica D. Workum,

Bas W. S. Volkers,

Davy van de Sande

et al.

Critical Care, Journal Year: 2025, Volume and Issue: 29(1)

Published: Feb. 10, 2025

Abstract Background Large language models (LLMs) show increasing potential for their use in healthcare administrative support and clinical decision making. However, reports on performance critical care medicine is lacking. Methods This study evaluated five LLMs (GPT-4o, GPT-4o-mini, GPT-3.5-turbo, Mistral 2407 Llama 3.1 70B) 1181 multiple choice questions (MCQs) from the gotheextramile.com database, a comprehensive database of at European Diploma Intensive Care examination level. Their was compared to random guessing 350 human physicians 77-MCQ practice test. Metrics included accuracy, consistency, domain-specific performance. Costs, as proxy energy consumption, were also analyzed. Results GPT-4o achieved highest accuracy 93.3%, followed by 70B (87.5%), (87.9%), GPT-4o-mini (83.0%), GPT-3.5-turbo (72.7%). Random yielded 41.5% ( p < 0.001). On test, all surpassed physicians, scoring 89.0%, 80.9%, 84.4%, 80.3%, 66.5%, respectively, 42.7% 0.001) 61.9% physicians. contrast other 0.001), GPT-3.5-turbo’s did not significantly outperform = 0.196). Despite high overall gave consistently incorrect answers. The most expensive model GPT-4o, costing over 25 times more than least model, GPT-4o-mini. Conclusions exhibit exceptional with four outperforming European-level exam. led but raised concerns about consumption. care, produced answers, highlighting need thorough ongoing evaluations guide responsible implementation settings.

Language: Английский

Citations

0