Опубликована: Янв. 1, 2024
Язык: Английский
Опубликована: Янв. 1, 2024
Язык: Английский
Deleted Journal, Год журнала: 2025, Номер unknown
Опубликована: Фев. 5, 2025
Abstract Accurate CT protocol assignment is crucial for optimizing medical imaging procedures. The integration of large language models (LLMs) may be helpful, but its efficacy as a clinical decision support system protocoling tasks remains unknown. This study aimed to develop and evaluate fine-tuned LLM specifically designed protocoling, well assess performance, both standalone in concurrent use, terms effectiveness efficiency within radiological workflows. retrospective included radiology tests contrast-enhanced chest abdominal examinations (2829/498/941 training/validation/testing). Inputs involve the indication section, age, anatomic coverage. was 15 epochs, selecting best model by macro sensitivity validation. Performance then evaluated on 800 randomly selected cases from test dataset. Two residents two radiologists assigned protocols with without referencing output system. exhibited high accuracy metrics, top-1 top-2 accuracies 0.923 0.963, respectively, 0.907. It processed each case an average 0.39 s. LLM, tool, improved (0.913 vs. 0.936) (0.920 0.926 respectively), improvement being statistically significant ( p = 0.02). Additionally, it reduced reading times 14% 12% radiologists. These results indicate potential LLMs improve diagnostic practice.
Язык: Английский
Процитировано
1Radiology, Год журнала: 2025, Номер 315(1)
Опубликована: Апрель 1, 2025
Язык: Английский
Процитировано
1Deleted Journal, Год журнала: 2024, Номер unknown
Опубликована: Авг. 26, 2024
Abstract Early detection of patients with impending bone metastasis is crucial for prognosis improvement. This study aimed to investigate the feasibility a fine-tuned, locally run large language model (LLM) in extracting unstructured Japanese radiology report and compare its performance manual annotation. retrospective included “metastasis” radiological reports (April 2018–January 2019, August–May 2022, April–December 2023 training, validation, test datasets 9559, 1498, 7399 patients, respectively). Radiologists reviewed clinical indication diagnosis sections (used as input data) classified them into groups 0 (no metastasis), 1 (progressive 2 (stable or decreased metastasis). The data group was under-sampled training due imbalance. best-performing from validation set subsequently tested using testing dataset. Two additional radiologists (readers 2) were involved classifying within dataset purposes. fine-tuned LLM, reader 1, demonstrated an accuracy 0.979, 0.996, 0.993, sensitivity 0/1/2 0.988/0.947/0.943, 1.000/1.000/0.966, 1.000/0.982/0.954, time required classification (s) 105, 2312, 3094 ( n = 711), respectively. Fine-tuned LLM extracted metastasis, demonstrating satisfactory that comparable slightly lower than annotation by noticeably shorter time.
Язык: Английский
Процитировано
5Swarm and Evolutionary Computation, Год журнала: 2025, Номер 94, С. 101859 - 101859
Опубликована: Фев. 5, 2025
Язык: Английский
Процитировано
0Informatics in Medicine Unlocked, Год журнала: 2025, Номер unknown, С. 101629 - 101629
Опубликована: Фев. 1, 2025
Язык: Английский
Процитировано
0Deleted Journal, Год журнала: 2025, Номер 9(1)
Опубликована: Март 9, 2025
This study evaluates the performance of four large language models (LLMs) in classifying malignant lymphoma stages using Lugano classification from free-text FDG-PET reports Japanese Specifically, we assess GPT-4o, Claude 3.5 Sonnet, Llama 3 70B, and Gemma 2 27B their ability interpret unstructured radiology texts. In a retrospective single-center study, 80 patients who underwent staging FDG-PET/CT for were included. The "Findings" sections analyzed without pre-processing. Each LLM assigned based on these reports. Performance was compared to reference standard determined by expert radiologists. Statistical analyses involved overall accuracy, weighted kappa agreement. GPT-4o achieved highest accuracy at 75% (60/80 cases) with substantial agreement (weighted κ = 0.801). Sonnet had 61.3% (49/80, 0.763). 70B showed accuracies 58.8% 57.5%, respectively, all indicating outperformed other LLMs assigning demonstrated potential advanced clinical While immediate utility automatically predicting stage an existing report may be limited, results highlight value understanding standardizing data.
Язык: Английский
Процитировано
0Smart Health, Год журнала: 2025, Номер unknown, С. 100557 - 100557
Опубликована: Март 1, 2025
Язык: Английский
Процитировано
0Computational and Structural Biotechnology Journal, Год журнала: 2025, Номер 27, С. 2139 - 2146
Опубликована: Янв. 1, 2025
The rapid advancement of large language models (LLMs) has generated interest in their potential integration clinical workflows. However, effectiveness interpreting complex (imaging) reports remains underexplored and at times yielded suboptimal results. This study aims to assess the capability state-of-the-art LLMs classify liver lesions based solely on textual descriptions from MRI reports, challenging interpret nuanced medical diagnostic criteria. We evaluated multiple LLMs, including GPT-4o, Deepseek V3, Claude 3.5 Sonnet, Gemini 2.0 Flash, a physician-generated fictitious dataset 88 designed resemble real radiology documentation. included representative spectrum common lesions, such as hepatocellular carcinoma, cholangiocarcinoma, hemangiomas, metastases, focal nodular hyperplasia. Model performance was assessed using micro macro F1-scores benchmarked against ground truth labels. Sonnet demonstrated highest accuracy among models, achieving F1-score 0.91, outperforming other lesion classification. These findings highlight feasibility for text-based support, particularly resource-limited or high-volume settings. While show promise diagnostics, further validation through prospective studies is necessary ensure reliable integration. emphasizes importance rigorous benchmarking model comprehensively.
Язык: Английский
Процитировано
0Emergency Radiology, Год журнала: 2025, Номер unknown
Опубликована: Июнь 2, 2025
Abstract Purpose This study aimed to develop an automated early warning system using a large language model (LLM) identify acute subacute brain infarction from free-text computed tomography (CT) or magnetic resonance imaging (MRI) radiology reports. Methods In this retrospective study, 5,573, 1,883, and 834 patients were included in the training (mean age, 67.5 ± 17.2 years; 2,831 males), validation 61.5 18.3 994 test 66.5 16.1 488 males) datasets. An LLM (Japanese Bidirectional Encoder Representations Transformers model) was fine-tuned classify CT MRI reports into three groups (group 0, newly identified infarction; group 1, known old 2, without infarction). The processes repeated 15 times, best-performing on dataset selected further evaluate its performance dataset. Results best exhibited sensitivities of 0.891, 0.905, 0.959 for respectively, macrosensitivity (the average sensitivity all groups) accuracy 0.918 0.923, respectively. model’s extracting infarcts high, with area under receiver operating characteristic curve 0.979 (95% confidence interval, 0.956–1.000). prediction time 0.115 0.037 s per patient. Conclusion A could extract based findings high performance.
Язык: Английский
Процитировано
0Deleted Journal, Год журнала: 2024, Номер unknown
Опубликована: Дек. 13, 2024
The aim of this study is to develop a fine-tuned large language model that classifies interventional radiology reports into technique categories and compare its performance with readers. This retrospective included 3198 patients (1758 males 1440 females; age, 62.8 ± 16.8 years) who underwent from January 2018 July 2024. Training, validation, test datasets involved 2292, 250, 656 patients, respectively. Input data texts in clinical indication, imaging diagnosis, image-finding sections reports. Manually classified (15 total) were utilized as reference data. Fine-tuning the Bidirectional Encoder Representations was performed using training validation datasets. process repeated 15 times due randomness learning process. best-performed model, which showed highest accuracy among trials, selected further evaluate independent dataset. report classification one radiologist (reader 1) two residents (readers 2 3). macrosensitivity (average each category's sensitivity) dataset 0.996 0.994, For dataset, accuracy/macrosensitivity 0.988/0.980, 0.986/0.977, 0.989/0.979, 0.988/0.980 best reader 1, 2, 3, required 0.178 s for per patient, 17.5–19.9 faster than In conclusion, high similar readers within remarkably shorter time.
Язык: Английский
Процитировано
1