Cited by Advancing radiology practice and research: harnessing the potential of large language models amidst imperfections

Lung Cancer Staging Using Chest CT and FDG PET/CT Free-Text Reports: Comparison Among Three ChatGPT Large-Language Models and Six Human Readers of Varying Experience DOI

Jong Eun Lee, Ki Seong Park, Yun‐Hyeon Kim

et al.

American Journal of Roentgenology, Journal Year: 2024, Volume and Issue: 223(6)

Published: Sept. 4, 2024

Although radiology reports are commonly used for lung cancer staging, this task can be challenging given radiologists' variable reporting styles as well reports' potentially ambiguous and/or incomplete staging-related information.

Language: Английский

Citations

Navigating Artificial Intelligence in Scientific Manuscript Writing: Tips and Traps DOI

Ishan Kumar, Nidhi Yadav, Ashish Verma

et al.

Indian journal of radiology and imaging - new series/Indian journal of radiology and imaging/Indian Journal of Radiology & Imaging, Journal Year: 2025, Volume and Issue: 35(S 01), P. S178 - S186

Published: Jan. 1, 2025

Abstract It is being increasingly recognized that the strategic use of artificial intelligence (AI) can catalyze process manuscript writing. However, it imperative we recognize hidden biases, pitfalls, and disadvantages relying solely on AI, such as accuracy concerns potential erosion nuanced human insight. With an emphasis crafting effective prompts inputs, this article reveals how to navigate labyrinth AI capabilities create a good-quality manuscript. also addresses evolving guidelines from various publishers, shedding light “leverage digital genie” responsibly ethically. We further explore which tools be harnessed for literature reviews, executing statistical analyses, polishing language Providing practical strategies maximizing AI's benefits, underscores indispensable value creativity critical thinking, stressing while “streamline mundane,” author's insight remains vital profound intellectual contributions.

Language: Английский

Citations

Effective Structured Information Extraction from Chest Radiography Reports Using Open-Weights Large Language Models DOI

James C. Gee, Michael S. Yao

Radiology, Journal Year: 2025, Volume and Issue: 314(1)

Published: Jan. 1, 2025

Language: Английский

Citations

Large Language Models and Large Multimodal Models in Medical Imaging: A Primer for Physicians DOI

Tyler Bradshaw, Xin Tie, Joshua Warner

et al.

Journal of Nuclear Medicine, Journal Year: 2025, Volume and Issue: unknown, P. jnumed.124.268072 - jnumed.124.268072

Published: Jan. 16, 2025

Large language models (LLMs) are poised to have a disruptive impact on health care. Numerous studies demonstrated promising applications of LLMs in medical imaging, and this number will grow as further evolve into large multimodal (LMMs) capable processing both text images. Given the substantial roles that LMMs care, it is important for physicians understand underlying principles these technologies so they can use them more effectively responsibly help guide their development. This article explains key concepts behind development application LLMs, including token embeddings, transformer networks, self-supervised pretraining, fine-tuning, others. It also describes technical process creating discusses cases imaging.

Language: Английский

Citations

Aligning large language models with radiologists by reinforcement learning from AI feedback for chest CT reports DOI

Lifang Yang,

Yuxing Zhou,

Jun Qi

et al.

European Journal of Radiology, Journal Year: 2025, Volume and Issue: 184, P. 111984 - 111984

Published: Feb. 6, 2025

Language: Английский

Citations

Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports DOI

Su Hwan Kim, Severin Schramm, Lisa C. Adams

et al.

npj Digital Medicine, Journal Year: 2025, Volume and Issue: 8(1)

Published: Feb. 12, 2025

Abstract Recent advancements in large language models (LLMs) have created new ways to support radiological diagnostics. While both open-source and proprietary LLMs can address privacy concerns through local or cloud deployment, provide advantages continuity of access, potentially lower costs. This study evaluated the diagnostic performance fifteen one closed-source LLM (GPT-4o) 1,933 cases from Eurorad library. provided differential diagnoses based on clinical history imaging findings. Responses were considered correct if true diagnosis appeared top three suggestions. Models further tested 60 non-public brain MRI a tertiary hospital assess generalizability. In datasets, GPT-4o demonstrated superior performance, closely followed by Llama-3-70B, revealing how are rapidly closing gap models. Our findings highlight potential as decision tools for challenging, real-world cases.

Language: Английский

Citations

Prompts to Table: Specification and Iterative Refinement for Clinical Information Extraction with Large Language Models DOI

David Hein, Alana Christie, Michael J. Holcomb

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 13, 2025

Extracting structured data from free-text medical records is laborious and error-prone. Traditional rule-based early neural network methods often struggle with domain complexity require extensive tuning. Large language models (LLMs) offer a promising solution but must be tailored to nuanced clinical knowledge complex, multipart entities. We developed flexible, end-to-end LLM pipeline extract diagnoses, per-specimen anatomical-sites, procedures, histology, detailed immunohistochemistry results pathology reports. A human-in-the-loop process create validated reference annotations for development set of 152 kidney tumor reports guided iterative refinement. To drive assessment performance we comprehensive error ontology- categorizing by significance (major vs. minor), source (LLM, manual annotation, or insufficient instructions), contextual origin. The finalized was applied 3,520 internal (of which 2,297 had pre-existing templated available cross referencing) evaluated adaptability using 53 publicly breast cancer After six iterations, major errors on the decreased 0.99% (14/1413 entities). identified 11 key contexts complications arose-including history integration, entity linking, specification granularity-which provided valuable insight in understanding our research goals. Using as reference, achieved macro-averaged F1 score 0.99 identifying subtypes 0.97 detecting metastasis. When adapted dataset, three iterations were required align domain-specific instructions, attaining 89% agreement curated data. This work illustrates that LLM-based extraction pipelines can achieve near expert-level accuracy carefully constructed instructions specific aims. Beyond raw metrics, itself-balancing specificity relevance-proved essential. approach offers transferable blueprint applying emerging capabilities other complex information tasks.

Language: Английский

Citations

Large language models for error detection in radiology reports: a comparative analysis between closed-source and privacy-compliant open-source models DOI

Babak Salam,

Claire Stüwe,

Sebastian Nowak

et al.

European Radiology, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 20, 2025

Language: Английский

Citations

Assessing large language models for Lugano classification of malignant lymphoma in Japanese FDG-PET reports DOI

Rintaro Ito, Keita Kato,

Kosuke Nanataki

et al.

Deleted Journal, Journal Year: 2025, Volume and Issue: 9(1)

Published: March 9, 2025

This study evaluates the performance of four large language models (LLMs) in classifying malignant lymphoma stages using Lugano classification from free-text FDG-PET reports Japanese Specifically, we assess GPT-4o, Claude 3.5 Sonnet, Llama 3 70B, and Gemma 2 27B their ability interpret unstructured radiology texts. In a retrospective single-center study, 80 patients who underwent staging FDG-PET/CT for were included. The "Findings" sections analyzed without pre-processing. Each LLM assigned based on these reports. Performance was compared to reference standard determined by expert radiologists. Statistical analyses involved overall accuracy, weighted kappa agreement. GPT-4o achieved highest accuracy at 75% (60/80 cases) with substantial agreement (weighted κ = 0.801). Sonnet had 61.3% (49/80, 0.763). 70B showed accuracies 58.8% 57.5%, respectively, all indicating outperformed other LLMs assigning demonstrated potential advanced clinical While immediate utility automatically predicting stage an existing report may be limited, results highlight value understanding standardizing data.

Language: Английский

Citations

Comprehensive testing of large language models for extraction of structured data in pathology DOI

Bastian Grothey,

Jan Odenkirchen,

Alen Brkic

et al.

Communications Medicine, Journal Year: 2025, Volume and Issue: 5(1)

Published: March 31, 2025

Abstract Background Pathology departments generate large volumes of unstructured data as free-text diagnostic reports. Converting these reports into structured formats for analytics or artificial intelligence projects requires substantial manual effort by specialized personnel. While recent studies show promise in using advanced language models structuring pathology data, they primarily rely on proprietary models, raising cost and privacy concerns. Additionally, important aspects such prompt engineering model quantization deployment consumer-grade hardware remain unaddressed. Methods We created a dataset 579 annotated German English versions. Six (proprietary: GPT-4; open-source: Llama2 13B, 70B, Llama3 8B, Qwen2.5 7B) were evaluated their ability to extract eleven key parameters from we investigated performance across different strategies techniques assess practical scenarios. Results Here that open-source with high precision, matching the accuracy GPT-4 model. The precision varies significantly configurations. These variations depend specific methods used during deployment. Conclusions Open-source demonstrate comparable solutions report data. This finding has significant implications healthcare institutions seeking cost-effective, privacy-preserving solutions. configurations provide valuable insights departments. Our publicly available bilingual serves both benchmark resource future research.

Language: Английский

Citations